Pkhdl1, a homolog of the autosomal recessive kidney disease gene

ABSTRACT

Nucleic acids encoding fibrocystin-L polypeptides and fibrocystin-L polypeptides are provided. Antibodies against the polypeptides, vectors and host cells containing the nucleic acids, methods for using the nucleic acids and polypeptides, and compositions and articles of manufacture also are provided.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Ser. No. 11/202,548, filedAug. 12, 2005, which is a continuation-in-part and claims benefit under35 U.S.C. §120 of International Application No. PCT/US2004/004300 havingan International Filing Date of Feb. 12, 2004, which published inEnglish as International Publication Number WO 2004/072268, and whichclaims priority to U.S. Provisional Application Ser. No. 60/446,860,filed Feb. 12, 2003.

STATEMENT AS TO FEDERALLY SPONSORED RESEARCH

Funding for the work described herein was provided in part by grant nos.DK58816, DK59597, and DK59505, awarded by the National Institute ofDiabetes and Digestive and Kidney Diseases (NIDDK). The government hascertain rights in the invention.

TECHNICAL FIELD

This document relates to PKHDL1, a homolog of the autosomal recessivekidney disease gene, and more particularly, to PKHDL1 nucleic acids andpolypeptides, and variants thereof.

BACKGROUND

Autosomal recessive polycystic kidney disease (ARPKD) is an importantrenal cause of death in the perinatal period and of childhood renalfailure. Neonatal disease presentation is typical, and characterized bygreatly enlarged kidneys due to fusiform dilation of collecting ducts;congenital hepatic fibrosis is often a major complication in olderpatients. Progress toward understanding this complex disorder hasrecently been made by the identification of the disease-causing gene,PKHD1, in chromosome region 6p12. PKHD1 is a very large gene (˜470 kb)containing 67 exons and an open reading frame (ORF) of 12,222 bp. PKHD1has a tissue-specific expression pattern with the highest levels infetal and adult kidney and lower levels in liver, pancreas and lung. Themurine ortholog, Pkhd1, has recently been described.

A notable feature of both the human and murine genes is that multipledifferent splice forms may be generated. Visualization of PKHD1transcripts by northern analysis has proved difficult with a smear ofproducts often detected. These may represent multiple splice forms,unusual sensitivity of this transcript to degradation, or a combinationof these factors. In situ hybridization of the murine transcript showedexpression in the developing kidney and mature collecting ducts, plusductal plate and bile ducts in the liver. Other sites of expressionduring development detected by in situ analysis were: large vessels,testis, sympathetic ganglia, pancreas and trachea with evidence thatsome sites of expression may be of specific splice forms.

The PKHD1 encoded protein, fibrocystin, is large (4074 aa) and predictedto be an integral membrane protein with a large extracellular region anda short cytoplasmic tail. Fibrocystin is not closely related to anyother characterized protein, although it contains multiple copies of adefined domain and has regions of homology to other proteins; it seemsto represent the founder member of a new protein family. The only wellcharacterized domain in fibrocystin is the TIG/IPT (immunoglobulin-likefold shared by plexins and transcription factors) that is also found inthe hepatocyte growth factor receptor (HGFR), plexins and other receptormolecules. Although fibrocystin has many more copies of this domain thanthese other proteins, the presence of the TIG domain, along with thestructure of the protein, suggested that it may also act as a receptor.

SUMMARY

The invention is based on the identification, cloning, and sequenceanalysis of PKHDL1 and Pkhdl1, human and murine homologs, respectively,of the ARPKD gene PKHD1. The PKHD1 homologs encode fibrocystin-L, alarge receptor protein (approximately 466 kDa) that contains a signalpeptide, a single transmembrane domain, and a short cytoplasmic tail.Fibrocystin-L has low, but highly significant, homology to fibrocystinover the entire length of the protein, except the extreme C-terminalregion containing the predicted transmembrane domain and cytoplasmictail. This level of homology is greater than that seen betweenpolycystin homologs, establishing the fibrocystins as a new proteinfamily. PKHDL1 expression is up-regulated specifically in T lymphocytesand may have a role in cellular immunity. PKHDL1 expression also isup-regulated in endometrial cancer and other cancers, including breast,ovarian, and colon cancers.

In one aspect, the invention features an isolated nucleic acid thatincludes a sequence encoding a fibrocystin-L polypeptide. Thefibrocystin-L polypeptide can be encoded by SEQ ID NO:1 or SEQ ID NO:2,and can include the amino acid sequence of SEQ ID NO:3 or SEQ ID NO:4.The fibrocystin-L polypeptide can include an amino acid sequence variantat a position selected from the group consisting of: position 702,position 1192, position 1199, position 1223, position 1514, position1607, position 1638, position 3050, position 3607, and position 4220 ofSEQ ID NO:3. For example, the amino acid sequence variant can beselected from the group consisting of: Pro at position 702, Ala atposition 1192, Ser at position 1199, Val at position 1223, Ser atposition 1514, Ile at position 1607, Cys at position 1638, Gln atposition 3050, Glu at position 3607, and Ile at position 4220. Theisolated nucleic acid can include a sequence variant with respect to SEQID NO:1, e.g., a sequence variant at a position selected from the groupconsisting of: position 1227, position 1404, position 1920, position1965, position 2105, position 3574, position 3599, position 3668,position 4540, position 4819, position 4913, position 6621, position9084, position 9150, position 10821, and position 12658 of SEQ ID NO:1.The sequence variant can be selected from the group consisting of: A atposition 1227, T at position 1404, C at position 1920, G at position1965, C at position 2105, G at position 3574, C at position 3599, T atposition 3668, A at position 4540, A at position 4819, G at position4913, G at position 6621, T at position 9084, G at position 9150, A atposition 10821, and A at position 12658.

In another aspect, the invention features an isolated nucleic acidencoding a fibrocystin polypeptide, wherein the nucleic acid includes atleast 300 contiguous nucleotides of SEQ ID NO:1 or a sequence variantthereof. The invention also features a vector that includes suchisolated nucleic acids and host cells including the vector.

The invention also features an isolated nucleic acid 10 to 1700nucleotides in length, the nucleic acid including a sequence, thesequence including one or more sequence variants relative to thesequence of SEQ ID NO:1, wherein the sequence is at least 80% identicalover its length to the corresponding sequence in SEQ ID NO:1. Thesequence variant can be at a position selected from the group consistingof: position 1227, position 1404, position 1920, position 1965, position2105, position 3574, position 3599, position 3668, position 4540,position 4819, position 4913, position 6621, position 9084, position9150, position 10821, and position 12658 of SEQ ID NO:1.

In yet another aspect, the invention features a plurality ofoligonucleotide primer pairs (e.g., at least three, 13, 16, or 23 primerpairs), wherein each primer is 10 to 50 nucleotides in length, andwherein each primer pair, in the presence of mammalian genomic DNA andunder polymerase chain reaction conditions, produces a nucleic acidproduct corresponding to a region of an PKHDL1 nucleic acid molecule,wherein the product is 30 to 1700 nucleotides in length. The nucleicacid product can include a nucleotide sequence variant relative to SEQID NO:1.

The invention also features a composition that includes a firstoligonucleotide primer and a second oligonucleotide primer, wherein thefirst oligonucleotide primer and the second oligonucleotide primer areeach 10 to 50 nucleotides in length, and wherein the first and secondprimers, in the presence of mammalian genomic DNA and under polymerasechain reaction conditions, produce a nucleic acid product correspondingto a region of a PKHDL1 nucleic acid molecule, wherein the product is 30to 1700 nucleotides in length. The nucleic acid product can include anucleotide sequence variant relative to SEQ ID NO:1.

Isolated nucleic acids that include the nucleotide sequence of SEQ IDNO:1 or SEQ ID NO:2, or the complement of SEQ ID NO:1 or SEQ ID NO:2,also are featured.

In yet another aspect, the invention features an antibody havingspecific binding affinity for a fibrocystin-L polypeptide. Suchantibodies can be used to detect fibrocystin-L in biological samples.

The invention also features a method for determining if a subject hasaltered cellular immunity. The method includes providing a nucleic acidsample (e.g., genomic DNA) from the subject, and determining whether thenucleic acid sample contains one or more sequence variants within thePKHDL1 gene of the subject relative to a wild-type PKHDL1 gene, whereinthe presence of the one or more sequence variants is associated withaltered cellular immunity in the subject. The determining step can beperformed by denaturing high performance liquid chromatography or directsequencing. The variant can be at position 2105, position 3574, position3599, position 3668, position 4540, position 4913, or position 9150 ofSEQ ID NO:1, or other positions. The method further can includeidentifying the sequence variant by DNA sequencing.

In yet another aspect, the invention features an article of manufacturethat includes a substrate, wherein the substrate includes a populationof isolated nucleic acid molecules, wherein each nucleic acid moleculeis 10 to 1000 nucleotides in length, wherein each nucleic acid moleculeincludes a different nucleotide sequence variant relative to thesequence of SEQ ID NO:1, and wherein the nucleic acid molecule is atleast 80% identical over its length to the corresponding sequence in SEQID NO:1.

The invention also features a method for monitoring the immune responseof a patient after vaccination. The method includes a) providing abiological sample from the patient after vaccination; b) determining thenumber of fibrocystin-L expressing T-cells in the biological sample; andc) comparing the number of fibrocystin-L expressing T-cells to abaseline number of fibrocystin-L expressing T-cells before vaccination.

The invention also features a method for detecting endometrial cancer.The method includes detecting the level of fibrocystin-L expression in abiological sample (e.g., endometrial tissue sample) from a patient. Anincrease in the level of fibrocystin-L in the sample relative to thelevel in a corresponding control sample is indicative of the presence ofendometrial cancer. Fibrocystin-L expression can be assessed bydetecting the polypeptide (e.g., by immunohistochemistry, westernblotting, or an ELISA) or by analysis of mRNA levels.

Unless otherwise defined, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Although methods and materialssimilar or equivalent to those described herein can be used to practicethe invention, suitable methods and materials are described below. Allpublications, patent applications, patents, and other referencesmentioned herein are incorporated by reference in their entirety. Incase of conflict, the present specification, including definitions, willcontrol. In addition, the materials, methods, and examples areillustrative only and not intended to be limiting.

Other features, objects, and advantages of the invention will beapparent from the description and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 depicts expression analysis of PKHDL1/Pkhdl1. FIG. 1A is anautoradiograph of a murine, multiple tissue (adult) Northern blot,hybridized with Pkhdl1 and showing weak smears in most lanes. FIGS.1B-1D are RT-PCR analyses of (B) newborn and adult murine tissues withPkhdl1 and β-actin control showing widespread low level expression ofPkhdl1; (C) human cell-lines SW13, adrenal carcinoma; ACHN, renaladenocarcinoma; HEK293, embryonic kidney; HT29, colonic adenocarcinoma;G-CCM, astrocytoma; Fib, skin fibroblasts; G-401, renal Wilm's tumor;Hep 3B, hepatoma, HeLa, cervical carcinoma; K562, erythroleukemia;lymph, EBV-transformed B-lymphocytes and MCF-7, breast cancer, withPKHDL1 and GAPDH and (D) various murine leukocyte populations asindicated with 30 and 40 cycles of PCR for Pkhdl1, and β-actin control.NK=natural killer cells and DCs=dendritic cells.

FIG. 2A is a Clustal W alignment of human fibrocystin (fib) (SEQ ID NO:5) and human fibrocystin-L (fibL) (SEQ ID NO: 3) from their N-termini topositions 3849 aa and 4185 aa, respectively. Black boxes show identitiesand shaded boxes similarities. Conserved domains (TIG; thin lines) anddefined regions of homology (TMEM; thick lines) are indicated by adashed line for fibrocystin-L and a solid line for fibrocystin. A “♦”indicates the start and stop of the dashed line. The predicted signalpeptide cleavage sites are indicated with arrowheads.

FIG. 2B is an alignment of the 14 TIG domains of fibrocystin-L (fibL)(SEQ ID NO: 6-19) compared to fibrocystin (fib) TIG 5 (SEQ ID NO: 20),HGFR (murine) TIG 1 (SEQ ID NO: 21), plexin (murine) TIG 2 (SEQ ID NO:22) and a receptor TIG consensus (SEQ ID NO: 23).

FIG. 2C is an alignment of the TMEM-A (A) and -B (B) regions offibrocystin (fib) (SEQ ID NOS: 24 and 25) and fibrocystin-L (fibL) (SEQID NOS: 26 and 27) to a hypothetical protein from the bacteriaChloroflexus aurantiacus (chlor) (SEQ ID NO: 28) and human proteins ofunknown function, TMEM2 (TMEM) (SEQ ID NO: 29) and XP051860 (51860) (SEQID NO: 30). These human proteins are gapped by the removal of 165 aa and251 aa, respectively, as shown.

FIG. 3 is a diagram comparing the proposed structures of fibrocystin andfibrocystin-L.

FIG. 4 is the nucleotide sequence of the PKHDL1 cDNA (SEQ ID NO:1).

FIG. 5 is the nucleotide sequence of the Pkhdl1 cDNA (SEQ ID NO:2).

FIG. 6 is the amino acid sequence of human fibrocystin-L polypeptide(SEQ ID NO:3).

FIG. 7 is the amino acid sequence of murine fibrocystin-L polypeptide(SEQ ID NO:4).

FIG. 8A is a schematic of the N terminal fusion expression construct offibrocystin-L (371 amino acids of fibrocystin-L).

FIG. 8B is a schematic of the PET43 vector.

FIG. 8C is a schematic of the PK-FLAG tagged fibrocystin-L expressed inPEAK cells.

FIG. 8D is a schematic of the antibody-screening strategy using PEAKcell membranes by western blot.

DETAILED DESCRIPTION

In general, the invention features PKHDL1 nucleic acids andpolypeptides. As used herein “PKHDL1” nucleic acids refers to both thehuman PKHDL1 gene and the murine Pkhdl1 gene. PKHDL1 nucleic acids canencode fibrocystin-L polypeptides. Identification of fibrocystin-Lshould greatly aid the understanding of the structure and function ofthe fibrocystin protein family. Furthermore, PKHDL1 nucleic acids mayhave a role in cellular immunity as such nucleic acids can bespecifically up-regulated in T lymphocytes following activation, and incancers such as endometrial cancer as expressed PKHDL1 nucleic acids areoverrepresented in endometrial adenocarcinomas and fibrocystin-Lexpression is up-regulated in endometrial cancer relative to normalendometrial tissue.

1. Isolated PKHDL1 Nucleic Acid Molecules

As used herein, the term “nucleic acid” refers to both RNA and DNA,including cDNA, genomic DNA, and synthetic (e.g., chemicallysynthesized) DNA. The nucleic acid can be double-stranded orsingle-stranded (i.e., a sense or an antisense single strand). As usedherein, “isolated nucleic acid” refers to a nucleic acid that isseparated from other nucleic acid molecules that are present in amammalian genome, including nucleic acids that normally flank one orboth sides of the nucleic acid in a mammalian genome (e.g., nucleicacids that flank a PKHDL1 gene). The term “isolated” as used herein withrespect to nucleic acids also includes any non-naturally-occurringnucleic acid sequence, since such non-naturally-occurring sequences arenot found in nature and do not have immediately contiguous sequences ina naturally-occurring genome.

An isolated nucleic acid can be, for example, a DNA molecule, providedone of the nucleic acid sequences normally found immediately flankingthat DNA molecule in a naturally-occurring genome is removed or absent.Thus, an isolated nucleic acid includes, without limitation, a DNAmolecule that exists as a separate molecule (e.g., a chemicallysynthesized nucleic acid, or a cDNA or genomic DNA fragment produced byPCR or restriction endonuclease treatment) independent of othersequences as well as DNA that is incorporated into a vector, anautonomously replicating plasmid, a virus (e.g., a retrovirus,lentivirus, adenovirus, or herpes virus), or into the genomic DNA of aprokaryote or eukaryote. In addition, an isolated nucleic acid caninclude an engineered nucleic acid such as a DNA molecule that is partof a hybrid or fusion nucleic acid. A nucleic acid existing amonghundreds to millions of other nucleic acids within, for example, cDNAlibraries or genomic libraries, or gel slices containing a genomic DNArestriction digest, is not to be considered an isolated nucleic acid.

Isolated PKHDL1 nucleic acid molecules are at least 10 nucleotides inlength (e.g., 10, 20, 50, 100, 200, 300, 400, 500, 1000, or morenucleotides in length). As described in the Examples (below), thefull-length human PKHDL1 transcript contains 78 exons and is 13081nucleotides in length, with a coding region that is 12,729 nucleotidesin length (FIG. 4; SEQ ID NO:1). The full-length murine transcript has acoding region that is 12,747 nucleotides in length (FIG. 5; SEQ IDNO:2). A PKHDL1 nucleic acid molecule therefore is not required tocontain all of the coding region listed in SEQ ID NO:1 or 2 or all ofthe exons; in fact, a PKHDL1 nucleic acid molecule can contain as littleas a single exon (as listed in Table 2, for example) or a portion of asingle exon (e.g., 10 nucleotides from a single exon). In someembodiments, the PKHDL1 transcript is alternatively spliced, which canremove a portion of an exon, a single exon, or multiple exons from thetranscript. Nucleic acid molecules that are less than full-length can beuseful, for example, for diagnostic purposes.

Nucleic acid molecules of the invention may have sequences identical tothose found in SEQ ID NO:1 or SEQ ID NO:2. Nucleic acid molecules alsocan have sequences identical to those found in the complement of SEQ IDNO:1 or SEQ ID NO:2. Alternatively, the sequence of a PKHDL1 nucleicacid molecule may contain one or more variants relative to the sequencesset forth in SEQ ID NO:1 or SEQ ID NO:2, or the complement of SEQ IDNO:1 or SEQ ID NO:2. As used herein, a “sequence variant” refers to anymutation that results in a difference between nucleotides at one or morepositions within the nucleic acid sequence of a particular nucleic acidmolecule and the nucleotides at the same positions within thecorresponding wild-type sequence set forth in SEQ ID NO:1 or SEQ IDNO:2. Nucleotides are referred to herein by the standard one-letterdesignation (A, C, G, or T). Sequence variants can be found in codingand non-coding regions, including exons, introns, promoters, anduntranslated sequences. The presence of one or more sequence variants inthe PKHDL1 nucleic acid sequence of a subject can be detected as setforth below in subsection 8.

Sequence variants can be, for example, deletions, insertions, orsubstitutions at one or more nucleotide positions (e.g., 1, 2, 3, 10, ormore than 10 positions), provided that the nucleic acid is at least 80%identical (e.g., 80%, 85%, 90%, 95%, or 99% identical) over its lengthto the corresponding region of the wild-type sequences set forth in SEQID NO:1 or SEQ ID NO:2. The human and murine coding regions are 84.1%identical. Percent sequence identity is calculated by determining thenumber of matched positions in aligned nucleic acid sequences, dividingthe number of matched positions by the total number of alignednucleotides, and multiplying by 100. A matched position refers to aposition in which identical nucleotides occur at the same position inaligned nucleic acid sequences. Percent sequence identity also can bedetermined for any amino acid sequence. To determine percent sequenceidentity, a target nucleic acid or amino acid sequence is compared tothe identified nucleic acid or amino acid sequence using the BLAST 2Sequences (B12seq) program from the stand-alone version of BLASTZcontaining BLASTN version 2.0.14 and BLASTP version 2.0.14. Thisstand-alone version of BLASTZ can be obtained from the State Universityof New York—Old Westbury campus library as well as at Fish &Richardson's web site (world wide web at fr.com/blast) or the U.S.government's National Center for Biotechnology Information web site(world wide web at ncbi.nlm.nih.gov/blast/executables). Instructionsexplaining how to use the B12seq program can be found in the readme fileaccompanying BLASTZ.

B12seq performs a comparison between two sequences using either theBLASTN or BLASTP algorithm. BLASTN is used to compare nucleic acidsequences, while BLASTP is used to compare amino acid sequences. Tocompare two nucleic acid sequences, the options are set as follows: −iis set to a file containing the first nucleic acid sequence to becompared (e.g., C:\seq1.txt); −j is set to a file containing the secondnucleic acid sequence to be compared (e.g., C:\seq2.txt); −p is set toblastn; −o is set to any desired file name (e.g., C:\output.txt); −q isset to −1; −r is set to 2; and all other options are left at theirdefault setting. The following command will generate an output filecontaining a comparison between two sequences: C:\B12seq c:\seq1.txt −jc:\seq2.txt −p blastn −o c:\output.txt —q−1−r 2. If the target sequenceshares homology with any portion of the identified sequence, then thedesignated output file will present those regions of homology as alignedsequences. If the target sequence does not share homology with anyportion of the identified sequence, then the designated output file willnot present aligned sequences.

Once aligned, a length is determined by counting the number ofconsecutive nucleotides from the target sequence presented in alignmentwith sequence from the identified sequence starting with any matchedposition and ending with any other matched position. A matched positionis any position where an identical nucleotide is presented in both thetarget and identified sequence. Gaps presented in the target sequenceare not counted since gaps are not nucleotides. Likewise, gaps presentedin the identified sequence are not counted since target sequencenucleotides are counted, not nucleotides from the identified sequence.

The percent identity over a particular length is determined by countingthe number of matched positions over that length and dividing thatnumber by the length followed by multiplying the resulting value by 100.For example, if (1) a 1000 nucleotide target sequence is compared to thesequence set forth in SEQ ID NO:1, (2) the B12seq program presents 200nucleotides from the target sequence aligned with a region of thesequence set forth in SEQ ID NO: 1 where the first and last nucleotidesof that 200 nucleotide region are matches, and (3) the number of matchesover those 200 aligned nucleotides is 180, then the 1000 nucleotidetarget sequence contains a length of 200 and a percent identity overthat length of 90 (i.e., 180÷200×100=90).

It will be appreciated that different regions within a single nucleicacid target sequence that aligns with an identified sequence can eachhave their own percent identity. It is noted that the percent identityvalue is rounded to the nearest tenth. For example, 78.11, 78.12, 78.13,and 78.14 are rounded down to 78.1, while 78.15, 78.16, 78.17, 78.18,and 78.19 are rounded up to 78.2. It also is noted that the length valuewill always be an integer.

Sequence variants that are deletions or insertions can createframe-shifts within the coding region that alter the amino acid sequenceof the encoded polypeptide, and thus can affect its structure andfunction.

Substitutions include silent mutations that do not affect the amino acidsequence of the encoded polypeptide, missense mutations that alter theamino acid sequence of the encoded polypeptide, and nonsense mutationsthat prematurely terminate and therefore truncate the encodedpolypeptide. Non-limiting examples of silent mutations are included inTable 5 (e.g., A substituted for G at position 1227 of SEQ ID NO:1, Tsubstituted for C at position 1404 of SEQ ID NO:1, C substituted for Tat position 1920 of SEQ ID NO:1, G substituted for A at position 1965 ofSEQ ID NO:1, G substituted for C at position 6621 of SEQ ID NO:1, Tsubstituted for A at position 9084 of SEQ ID NO:1). Non-limitingexamples of missense mutations are included in Table 5 (e.g., Gsubstituted for A at position 490, C substituted for A at position 2105of SEQ ID NO:1, G substituted for A at position 3574 of SEQ ID NO:1, Csubstituted for T at position 3599 of SEQ ID NO:1, T substituted for Gat position 3668 of SEQ ID NO:1, A substituted for C at position 4540 ofSEQ ID NO:1, A substituted for G at position 4819 of SEQ ID NO:1, Gsubstituted for A at position 4913 of SEQ ID NO:1, G substituted for Cat position 9150 of SEQ ID NO:1, A substituted for C at position 10821of SEQ ID NO:1, and A substituted for G at position 12658 of SEQ IDNO:1).

Deletion, insertion, and substitution sequence variants can create ordestroy splice sites and thus alter the splicing of a PKHDL1 transcript,such that the encoded polypeptide contains a deletion or insertionrelative to the polypeptide encoded by the corresponding wild-typenucleic acid sequences set forth in SEQ ID NO:1 or SEQ ID NO:2. Sequencevariants that affect splice sites of PKHDL1 nucleic acid molecules canresult in fibrocystin-L polypeptides that lack the amino acids encodedby particular exons or portions thereof.

Certain sequence variants described herein may be associated with ARPKD.Such sequence variants typically result in a change in the encodedpolypeptide that can have a dramatic effect on the function of thepolypeptide. These changes can include, for example, a truncation, aframe-shifting alteration, or a substitution at a highly conservedposition. Conserved positions can be identified by inspection of anucleotide or amino acid sequence alignment showing related nucleicacids or polypeptides from different species (e.g., the sequencealignments shown in FIGS. 2B and 2C). For example, the non-conservativesubstitution of a proline at amino acid 702 for a glutamine may beassociated with ARPKD. In some ARPKD patients, the same ARPKD-associatedsequence variant can be found on both alleles. In other patients, acombination of ARPKD-associated sequence variants can be found onseparate alleles of an ARPKD gene.

Other sequence variants described herein include polymorphisms thatoccur within a normal population and typically are not associated withARPKD. Sequence variants of this type can be, for example, nucleotidesubstitutions (e.g., silent mutations) that do not alter the amino acidsequence of the encoded fibrocystin-L polypeptide, or alterations thatalter the amino acid sequence but that do not affect the overallfunction of the polypeptide. With respect to SEQ ID NO:1, sequencevariants that are polymorphisms can include, for example, an A atposition 4540 of SEQ ID NO:1 or an A at position 12658 of SEQ ID NO:1.

2. Production of Isolated PKHDL1 Nucleic Acid Molecules

Isolated nucleic acid molecules of the invention can be produced bystandard techniques, including, without limitation, common molecularcloning and chemical nucleic acid synthesis techniques. For example,polymerase chain reaction (PCR) techniques can be used to obtain anisolated PKHDL1 nucleic acid molecule. PCR refers to a procedure ortechnique in which target nucleic acids are enzymatically amplified.Sequence information from the ends of the region of interest or beyondtypically is employed to design oligonucleotide primers that areidentical in sequence to opposite strands of the template to beamplified. PCR can be used to amplify specific sequences from DNA aswell as RNA, including sequences from total genomic DNA or totalcellular RNA. Primers are typically 14 to 40 nucleotides in length, butcan range from 10 nucleotides to hundreds of nucleotides in length.General PCR techniques are described, for example in PCR Primer: ALaboratory Manual, Ed. by Dieffenbach, C. and Dveksler, G., Cold SpringHarbor Laboratory Press, 1995. When using RNA as a source of template,reverse transcriptase can be used to synthesize complementary DNA (cDNA)strands. Ligase chain reaction, strand displacement amplification,self-sustained sequence replication or nucleic acid sequence-basedamplification also can be used to obtain isolated nucleic acids. See,for example, Lewis (1992) Genetic Engineering News 12(9): 1; Guatelli etal. (1990) Proc. Natl. Acad. Sci. USA 87:1874-1878; and Weiss (1991)Science 254:1292-1293.

In one embodiment, a primer is a single-stranded or double-strandedoligonucleotide that typically is 10 to 50 nucleotides in length, andwhen combined with mammalian genomic DNA and subjected to PCRconditions, is capable of being extended to produce a nucleic acidproduct corresponding to a region of a PKHDL1 nucleic acid molecule.Typically, a PKHDL1 PCR product is 30 to 1700 nucleotides in length(e.g., 30, 35, 50, 100, 250, 500, 1000, 1500, or 1650 nucleotides inlength). Specific regions of mammalian DNA can be amplified (i.e.,replicated such that multiple exact copies are produced) when a pair ofoligonucleotide primers is used in the same PCR reaction, wherein oneprimer contains a nucleotide sequence from the coding strand of a PKHDL1nucleic acid and the other primer contains a nucleotide sequence fromthe non-coding strand of a PKHDL1 nucleic acid. The “coding strand” of anucleic acid is the nontranscribed strand, which has the same nucleotidesequence as the specified RNA transcript (with the exception that theRNA transcript contains uracil in place of thymidine residues), whilethe “non-coding strand” of a nucleic acid is the strand that serves asthe template for transcription.

A single PCR reaction mixture may contain one pair of oligonucleotideprimers. Alternatively, a single reaction mixture may contain aplurality of oligonucleotide primer pairs, in which case multiple PCRproducts can be generated. Each primer pair can amplify, for example,one exon or a portion of one exon. Intron sequences also can beamplified.

Oligonucleotide primers can be incorporated into compositions.Typically, a composition of the invention will contain a firstoligonucleotide primer and a second oligonucleotide primer, each 10 to50 nucleotides in length, which can be combined with genomic DNA from amammal and subjected to PCR conditions as set out below, to produce anucleic acid product that corresponds to a region of a PKHDL1 nucleicacid molecule. A composition also may contain buffers and other reagentsnecessary for PCR (e.g., DNA polymerase or nucleotides). Furthermore, acomposition may contain one or more additional pairs of oligonucleotideprimers (e.g., 3, 13, 16, or 23 primer pairs), such that multiplenucleic acid products can be generated.

Specific PCR conditions typically are defined by the concentration ofsalts (e.g., MgCl₂) in the reaction buffer, and by the temperaturesutilized for melting, annealing, and extension. Specific concentrationsor amounts of primers, templates, deoxynucleotides (dNTPs), and DNApolymerase also may be set out. For example, PCR conditions with abuffer containing 2.5 mM MgCl₂, and melting, annealing, and extensiontemperatures of 94° C., 44-65° C., and 72° C., respectively, areparticularly useful. Under such conditions, a PCR sample can include,for example, 60 ng genomic DNA, 8 mM each primer, 200 pM dNTPs, 1 U DNApolymerase (e.g., AmpliTaq Gold), and the appropriate amount of bufferas specified by the manufacturer of the polymerase (e.g., 1× AmpliTaqGold buffer). Denaturation, annealing, and extension each may be carriedout for 30 seconds per cycle, with a total of 25 to 35 cycles, forexample. An initial denaturation step (e.g., 94° C. for 2 minutes) and afinal elongation step (e.g., 72° C. for 10 minutes) also may be useful.

Isolated nucleic acids of the invention also can be chemicallysynthesized, either as a single nucleic acid molecule (e.g., usingautomated DNA synthesis in the 3′ to 5′ direction using phosphoramiditetechnology) or as a series of oligonucleotides. For example, one or morepairs of long oligonucleotides (e.g., >100 nucleotides) can besynthesized that contain the desired sequence, with each pair containinga short segment of complementarity (e.g., about 15 nucleotides) suchthat a duplex is formed when the oligonucleotide pair is annealed. DNApolymerase is used to extend the oligonucleotides, resulting in asingle, double-stranded nucleic acid molecule per oligonucleotide pair,which then can be ligated into a vector.

Isolated nucleic acids of the invention also can be obtained bymutagenesis. For example, the reference sequence depicted in FIG. 4 or 5can be mutated using standard techniques includingoligonucleotide-directed mutagenesis and site-directed mutagenesisthrough PCR. See, Short Protocols in Molecular Biology, Chapter 8, GreenPublishing Associates and John Wiley & Sons, Edited by Ausubel et al.,1992. Examples of positions that can be modified are described above andin Table 5, as well as in the alignments of FIGS. 2B and 2C.

3. Vectors and Host Cells

The invention also provides vectors containing nucleic acids such asthose described above. As used herein, a “vector” is a replicon, such asa plasmid, phage, or cosmid, into which another DNA segment may beinserted so as to bring about the replication of the inserted segment.The vectors of the invention can be expression vectors. An “expressionvector” is a vector that includes one or more expression controlsequences, and an “expression control sequence” is a DNA sequence thatcontrols and regulates the transcription and/or translation of anotherDNA sequence.

In the expression vectors of the invention, the nucleic acid is operablylinked to one or more expression control sequences. As used herein,“operably linked” means incorporated into a genetic construct so thatexpression control sequences effectively control expression of a codingsequence of interest. Examples of expression control sequences includepromoters, enhancers, and transcription terminating regions. A promoteris an expression control sequence composed of a region of a DNAmolecule, typically within 100 nucleotides upstream of the point atwhich transcription starts (generally near the initiation site for RNApolymerase II). To bring a coding sequence under the control of apromoter, it is necessary to position the translation initiation site ofthe translational reading frame of the polypeptide between one and aboutfifty nucleotides downstream of the promoter. Suitable promoters can betissue-specific (i.e., capable of directing expression of a nucleic acidpreferentially in a particular cell type. Non-limiting examples oftissue-specific promoters include the lymphoid-specific promoters(Calame and Eaton (1988), Adv. Immunol., 43:235-257) and T-cell specificpromoters (Winoto and Baltimore (1989), EMBO J., 8:729-733). Enhancersprovide expression specificity in terms of time, location, and level.Unlike promoters, enhancers can function when located at variousdistances from the transcription site. An enhancer also can be locateddownstream from the transcription initiation site. A coding sequence is“operably linked” and “under the control” of expression controlsequences in a cell when RNA polymerase is able to transcribe the codingsequence into mRNA, which then can be translated into the proteinencoded by the coding sequence.

Suitable expression vectors include, without limitation, plasmids andviral vectors derived from, for example, bacteriophage, baculoviruses,tobacco mosaic virus, herpes viruses, cytomegalovirus, retroviruses,vaccinia viruses, adenoviruses, and adeno-associated viruses. Numerousvectors and expression systems are commercially available from suchcorporations as Novagen (Madison, Wis.), Clontech (Palo Alto, Calif.),Stratagene (La Jolla, Calif.), and Invitrogen/Life Technologies(Carlsbad, Calif.). For example, the pET-43a⁺ vector from Novagen can beused.

An expression vector can include a tag sequence designed to facilitatesubsequent manipulation of the expressed nucleic acid sequence (e.g.,purification or localization). Tag sequences, such as green fluorescentprotein (GFP), glutathione S-transferase (GST), polyhistidine, c-myc,hemagglutinin, or Flag™ tag (Kodak, New Haven, Conn.) sequencestypically are expressed as a fusion with the encoded polypeptide. Suchtags can be inserted anywhere within the polypeptide including at eitherthe carboxyl or amino terminus.

The invention also provides host cells containing vectors of theinvention. The term “host cell” is intended to include prokaryotic andeukaryotic cells into which a recombinant expression vector can beintroduced. As used herein, “transformed” and “transfected” encompassthe introduction of a nucleic acid molecule (e.g., a vector) into a cellby one of a number of techniques. Although not limited to a particulartechnique, a number of these techniques are well established within theart. Prokaryotic cells can be transformed with nucleic acids by, forexample, electroporation or calcium chloride mediated transformation.Nucleic acids can be transfected into mammalian cells by techniquesincluding, for example, calcium phosphate co-precipitation,DEAE-dextran-mediated transfection, lipofection, electroporation, ormicroinjection. Suitable methods for transforming and transfecting hostcells are found in Sambrook et al., Molecular Cloning: A LaboratoryManual (2^(nd) edition), Cold Spring Harbor Laboratory, New York (1989),and reagents for transformation and/or transfection are commerciallyavailable (e.g., Lipofectin (Invitrogen/Life Technologies); Fugene(Roche, Indianapolis, Ind.); and SuperFect (Qiagen, Valencia, Calif.)).

In one embodiment, host cells are T-cells that have been isolated from asubject. Such cells can be manipulated ex vivo (e.g., by transfectingwith a vector that encodes a fibrocystin-L polypeptide as describedabove) then re-introduced into the subject to augment the subject'simmune responses to, for example, an infectious disease or cancer, or asadjuvant therapy following an allogeneic bone marrow transplant (i.e.,donor lymphocyte infusion). In other embodiments, the host cells arePEAK cells, a human embryonic kidney (HEK)-293 derivative selected forhigh transfection frequency (Edge Biosystems, Gaithersburg, Md.).

4. Fibrocystin-L Polypeptides

The invention provides purified fibrocystin-L polypeptides that areencoded by the PKHDL1 nucleic acid molecules of the invention. A“polypeptide” refers to a chain of at least 10 amino acid residues(e.g., 10, 20, 50, 75, 100, 200, or more than 200 residues), regardlessof post-translational modification (e.g., phosphorylation orglycosylation). Typically, a fibrocystin-L polypeptide of the inventionis capable of eliciting a fibrocystin-L-specific antibody response(i.e., is able to act as an immunogen that induces the production ofantibodies capable of specific binding to fibrocystin-L).

A fibrocystin-L polypeptide may have an amino acid sequence that isidentical to that of SEQ ID NO:3 or SEQ ID NO:4. Alternatively, afibrocystin-L polypeptide can include an amino acid sequence variant. Asused herein, an amino acid sequence variant refers to a deletion,insertion, or substitution at one or more amino acid positions (e.g., 1,2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 positions), provided thatthe polypeptide has an amino acid sequence that is at least 80%identical (e.g., 80%, 85%, 90%, 95%, or 99% identical) over its lengthto the corresponding region of the sequences set forth in SEQ ID NO:3 orSEQ ID NO:4.

Percent sequence identity is calculated by determining the number ofmatched positions in aligned amino acid sequences, dividing the numberof matched positions by the total number of aligned amino acids, andmultiplying by 100. The percent identity between amino acid sequencestherefore is calculated in a manner analogous to the method forcalculating the identity between nucleic acid sequences, using theB12seq program from the stand-alone version of BLASTZ containing BLASTNversion 2.0.14 and BLASTP version 2.0.14; see subsection 1, above. Amatched position refers to a position in which identical residues occurat the same position in aligned amino acid sequences. To compare twoamino acid sequences, the options of B12seq are set as follows: −i isset to a file containing the first amino acid sequence to be compared(e.g., C:\seq1.txt); −j is set to a file containing the second aminoacid sequence to be compared (e.g., C:\seq2.txt); −p is set to blastp;−o is set to any desired file name (e.g., C:\output.txt); and all otheroptions are left at their default setting. The following command willgenerate an output file containing a comparison between two amino acidsequences: C:\B12seq c:\seq1.txt −j c:\seq2.txt −p blastp −oc:\output.txt. If the target sequence shares homology with any portionof the identified sequence, then the designated output file will presentthose regions of homology as aligned sequences. If the target sequencedoes not share homology with any portion of the identified sequence,then the designated output file will not present aligned sequences.

Once aligned, a length is determined by counting the number ofconsecutive amino acid residues from the target sequence presented inalignment with sequence from the identified sequence starting with anymatched position and ending with any other matched position. A matchedposition is any position where an identical amino acid residue ispresented in both the target and identified sequence. Gaps presented inthe target sequence are not counted since gaps are not amino acidresidues. Likewise, gaps presented in the identified sequence are notcounted since target sequence amino acid residues are counted, not aminoacid residues from the identified sequence.

The percent identity over a particular length is determined by countingthe number of matched positions over that length and dividing thatnumber by the length followed by multiplying the resulting value by 100.For example, if (1) a 1000 amino acid target sequence is compared to thesequence set forth in SEQ ID NO:3, (2) the B12seq program presents 200amino acids from the target sequence aligned with a region of thesequence set forth in SEQ ID NO:3 where the first and last amino acidsof that 200 amino acid region are matches, and (3) the number of matchesover those 200 aligned amino acids is 180, then the 1000 amino acidtarget sequence contains a length of 200 and a percent identity overthat length of 90 (i.e. 180÷200×100=90). As described for alignednucleic acids in subsection 1, different regions within a single aminoacid target sequence that aligns with an identified sequence can eachhave their own percent identity. It also is noted that the percentidentity value is rounded to the nearest tenth, and the length valuewill always be an integer.

The deletion of amino acids from a fibrocystin-L polypeptide or theinsertion of amino acids into a fibrocystin-L polypeptide cansignificantly affect the structure of the polypeptide. A deletion canresult in a fibrocystin-L polypeptide that is truncated. Amino acidsalso may be deleted from a fibrocystin-L polypeptide as a result ofaltered splicing (see subsection 1, above).

Amino acid substitutions may be conservative or non-conservative.Conservative amino acid substitutions replace an amino acid with anamino acid of the same class, whereas non-conservative amino acidsubstitutions replace an amino acid with an amino acid of a differentclass. Conservative amino acid substitutions typically have littleeffect on the structure or function of a polypeptide. Examples ofconservative substitutions include amino acid substitutions within thefollowing groups: glycine and alanine; valine, isoleucine, and leucine;aspartic acid and glutamic acid; asparagine, glutamine, serine, andthreonine; lysine, histidine, and arginine; and phenylalanine andtyrosine. Conservative substitutions within a fibrocystin-L polypeptidecan include, for example, Ile substituted for Val at amino acid position1607 of SEQ ID NO:3, Glu substituted for Asp at amino acid position 3607of SEQ ID NO:3, and Ile substituted for Val at amino acid position 4220of SEQ ID NO:3.

Non-conservative substitutions may result in a substantial change in thehydrophobicity of the polypeptide or in the bulk of a residue sidechain. In addition, non-conservative substitutions may make asubstantial change in the charge of the polypeptide, such as reducingelectropositive charges or introducing electronegative charges. Examplesof non-conservative substitutions include a basic amino acid for anon-polar amino acid, or a polar amino acid for an acidic amino acid.Non-conservative substitutions within a fibrocystin polypeptide caninclude, for example, Pro substituted for Gln at amino acid position 702of SEQ ID NO:3, Ala substituted for Thr at amino acid position 1192 ofSEQ ID NO:3, Ser substituted for Leu at amino acid position 1199 of SEQID NO:3, Val substituted for Gly at position 1223 of SEQ ID NO:3, Sersubstituted for Arg at position 1514 of SEQ ID NO:3, Cys substituted forTyr at position 1638 of SEQ ID NO:3, or Gln substituted for His atposition 3050 of SEQ ID NO:1.

The term “purified” as used herein with reference to a polypeptiderefers to a polypeptide that either has no naturally occurringcounterpart (e.g., a peptidomimetic), has been chemically synthesizedand is thus uncontaminated by other polypeptides, or has been separatedor purified from other cellular components by which it is naturallyaccompanied (e.g., other cellular proteins, polynucleotides, or cellularcomponents). Typically, the polypeptide is considered “purified” when itis at least 70% (e.g., 70%, 80%, 90%, 95%, or 99%), by dry weight, freefrom the proteins and naturally occurring organic molecules with whichit naturally associates.

Fibrocystin-L polypeptides typically contain multiple functional domains(e.g., two or more regions that are responsible for a specific functionof the polypeptide.) A fibrocystin-L polypeptide may contain one or moretransmembrane (TM) domains, such that part of the polypeptide iscytoplasmic and part is extracellular. Such a domain can be located, forexample, between amino acid residues 4213 and 4235 of SEQ ID NO:3, suchthat the full length fibrocystin-L polypeptide has a large N-terminalextracellular region and a short, 8 amino acid C-terminal cytoplasmictail. In order to facilitate insertion of the polypeptide into thecellular membrane, a fibrocystin-L polypeptide also may include ahydrophobic signal peptide (e.g., the 20 amino acid residues at theN-terminus). Additionally, a fibrocystin-L polypeptide can contain oneor more (e.g., 3, 11, or 14) TIG/IPT domains (Transcription-associatedImmunoGlobulin domain/Immunoglobulin-like fold shared by Plexins andTranscription factors; referred to hereafter as TIG domains), similar tothose found in fibrocystin, the hepatocyte growth factor receptor,plexins, and the macrophage-stimulating protein receptor. TIG domainscan be located anywhere within the polypeptide, although localizationwithin the N-terminal 50% of a fibrocystin-L polypeptide is particularlycommon. Fibrocystin-L polypeptides also can have one or more TMEM2regions of homology (e.g., residues 2180-2375 or 3032-3376 of SEQ IDNO:3). Furthermore, a fibrocystin-L polypeptide can contain one or moresites for N-glycosylation (e.g., 56 N-glycosylation sites in theN-terminal region). A fibrocystin polypeptide also may contain sites(e.g., in the C-terminal tail) for phosphorylation by protein kinase Aand/or protein kinase C.

5. Production of Fibrocystin-L Polypeptides

Fibrocystin-L polypeptides can be produced by a number of methods, manyof which are well known in the art. By way of example and notlimitation, fibrocystin-L polypeptides can be obtained by extractionfrom a natural source (e.g., from isolated cells, tissues or bodilyfluids), by expression of a recombinant nucleic acid encoding thepolypeptide, or by chemical synthesis.

Fibrocystin-L polypeptides of the invention can be produced by, forexample, standard recombinant technology, using expression vectorsencoding fibrocystin-L polypeptides. The resulting fibrocystin-Lpolypeptides then can be purified. Expression systems that can be usedfor small or large scale production of fibrocystin-L polypeptidesinclude, without limitation, microorganisms such as bacteria (e.g., E.coli and B. subtilis) transformed with recombinant bacteriophage DNA,plasmid DNA, or cosmid DNA expression vectors containing the nucleicacid molecules of the invention; yeast (e.g., S. cerevisiae) transformedwith recombinant yeast expression vectors containing the nucleic acidmolecules of the invention; insect cell systems infected withrecombinant virus expression vectors (e.g., baculovirus) containing thenucleic acid molecules of the invention; plant cell systems infectedwith recombinant virus expression vectors (e.g., tobacco mosaic virus)or transformed with recombinant plasmid expression vectors (e.g., Tiplasmid) containing the nucleic acid molecules of the invention; ormammalian cell systems (e.g., primary cells or immortalized cell linessuch as COS cells, Chinese hamster ovary cells, HeLa cells, HEK-293cells, PEAK cells, and 3T3 L1 cells) harboring recombinant expressionconstructs containing promoters derived from the genome of mammaliancells (e.g., the metallothionein promoter) or from mammalian viruses(e.g., the adenovirus late promoter and the cytomegalovirus promoter),along with the nucleic acids of the invention.

Suitable methods for purifying the polypeptides of the invention caninclude, for example, affinity chromatography, immunoprecipitation, sizeexclusion chromatography, and ion exchange chromatography. See, forexample, Flohe et al. (1970) Biochim. Biophys. Acta. 220:469-476, orTilgmann et al. (1990) FEBS 264:95-99. The extent of purification can bemeasured by any appropriate method, including but not limited to: columnchromatography, polyacrylamide gel electrophoresis, or high-performanceliquid chromatography. Fibrocystin-L polypeptides also can be“engineered” to contain a tag sequence described herein that allows thepolypeptide to be purified (e.g., captured onto an affinity matrix).Immunoaffinity chromatography also can be used to purify fibrocystin-Lpolypeptides.

6. Anti-Fibrocystin-L Antibodies

The invention also provides antibodies having specific binding affinityfor fibrocystin-L polypeptides. Such antibodies can be useful fordiagnostic purposes (an antibody that recognizes a specificfibrocystin-L variant could be used to determine if a subject's cellularimmunity is compromised or to detect endometrial cancer). An antibodyhaving specific binding affinity for a fibrocystin-L polypeptide alsocan be used to prevent the development of an autoimmune disease in asubject at risk for autoimmunity or to treat an autoimmune disease(e.g., thyroiditis, inflammatory bowel disease, asthma, rheumatoidarthritis, systemic lupus erythematosis (SLE), or type I diabetes) in asubject. For example, an antibody such as a monoclonal antibody can beadministered to a subject that contains antibodies against pancreaticislet antigens and a family history of type I diabetes to prevent thedevelopment of diabetes. An antibody having specific binding affinityfor a fibrocystin-L polypeptide also can be administered to a subject toprevent or treat rejection of an organ or tissue transplant.

An “antibody” or “antibodies” includes intact molecules as well asfragments thereof that are capable of binding to an epitope of afibrocystin-L polypeptide. The term “epitope” refers to an antigenicdeterminant on an antigen to which an antibody binds. Epitopes usuallyconsist of chemically active surface groupings of molecules such asamino acids or sugar side chains, and typically have specificthree-dimensional structural characteristics, as well as specific chargecharacteristics. Epitopes generally have at least five contiguous aminoacids. The terms “antibody” and “antibodies” include polyclonalantibodies, monoclonal antibodies, humanized or chimeric antibodies,single chain Fv antibody fragments, Fab fragments, and F(ab)₂ fragments.Suitable “antibody” or “antibodies” can be of any isotype. Polyclonalantibodies are heterogeneous populations of antibody molecules that arespecific for a particular antigen, while monoclonal antibodies arehomogeneous populations of antibodies to a particular epitope containedwithin an antigen. Monoclonal antibodies are particularly useful.

In general, a fibrocystin-L polypeptide is produced as described above,i.e., recombinantly, by chemical synthesis, or by purification of thenative protein, and then used to immunize animals. Various host animalsincluding, for example, rabbits, chickens, mice, guinea pigs, and rats,can be immunized by injection of the protein of interest. Depending onthe host species, adjuvants can be used to increase the immunologicalresponse and include Freund's adjuvant (complete and/or incomplete),mineral gels such as aluminum hydroxide, surface-active substances suchas lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions,keyhole limpet hemocyanin, and dinitrophenol. Polyclonal antibodies arecontained in the sera of the immunized animals. Monoclonal antibodiescan be prepared using standard hybridoma technology. In particular,monoclonal antibodies can be obtained by any technique that provides forthe production of antibody molecules by continuous cell lines in cultureas described, for example, by Kohler et al. (1975) Nature 256:495-497,the human B-cell hybridoma technique of Kosbor et al. (1983) ImmunologyToday 4:72, and Cote et al. (1983) Proc. Natl. Acad. Sci. USA80:2026-2030, and the EBV-hybridoma technique of Cole et al., MonoclonalAntibodies and Cancer Therapy, Alan R. Liss, Inc. pp. 77-96 (1983). Suchantibodies can be of any immunoglobulin class including IgM, IgG, IgE,IgA, IgD, and any subclass thereof. The hybridoma producing themonoclonal antibodies of the invention can be cultivated in vitro or invivo.

A chimeric antibody is a molecule in which different portions arederived from different animal species, such as those having a variableregion derived from a mouse monoclonal antibody and a humanimmunoglobulin constant region. Chimeric antibodies can be producedthrough standard techniques.

Antibody fragments that have specific binding affinity for fibrocystin-Lpolypeptides can be generated by known techniques. Such antibodyfragments include, but are not limited to, F(ab′)₂ fragments that can beproduced by pepsin digestion of an antibody molecule, and Fab fragmentsthat can be generated by deducing the disulfide bridges of F(ab′)₂fragments. Alternatively, Fab expression libraries can be constructed.See, for example, Huse et al. (1989) Science 246:1275-1281. Single chainFv antibody fragments are formed by linking the heavy and light chainfragments of the Fv region via an amino acid bridge (e.g., 15 to 18amino acids), resulting in a single chain polypeptide. Single chain Fvantibody fragments can be produced through standard techniques, such asthose disclosed in U.S. Pat. No. 4,946,778.

Once produced, antibodies or fragments thereof can be tested forrecognition of a fibrocystin-L polypeptide by standard immunoassaymethods including, for example, enzyme-linked immunosorbent assay(ELISA) or radioimmuno assay (RIA). See, Short Protocols in MolecularBiology, eds. Ausubel et al., Green Publishing Associates and John Wiley& Sons (1992). Suitable antibodies typically have equal bindingaffinities for recombinant and native proteins. As described herein,monoclonal antibodies FibLA-2, FibLA-10.1, FibLA-10.2, FibLA-4.1,FibLA-4.2, FibLA-11.1, FibLA-11.2, FibLA-11.3, FibLA-13.1, andFibLA-13.2 are useful for detecting fibrocystin-L in tissues and celllines.

7. Methods for Using PKHDL1 Nucleic Acid Molecules, Fibrocystin-LPolypeptides, and Fibrocystin-L Expressing Cells

As described herein, fibrocystin-L is widely expressed at a low levelacross many human tissues, including kidney (proximal and distaltubules), fallopian tube, thyroid, liver, adrenal cortex, gallbladder,testis, breast, and spleen. It is strongly expressed in fallopian tube,which is known to be heavily ciliated. It also appears to be upregulatedin endometrial adenocarcinomas compared with levels detected in normalendometrium. Immunostaining of endometrium and fallopian tubes indicatedthe protein has an apical distribution and is present on endometrialepithelial apical cilia of these cells and appears to mislocalizeapically in some cancers. Other hormone dependent cancers (e.g., breastand ovarian) also had up-regulated levels of fibrocystin-L, as did somelung and colon cancers. The tissue distribution of fibrocystin-Lelsewhere seems to correlate with known ciliated epithelial surfaces(e.g., thyroid, kidney, lung, adrenal cortex, and pituitary). Thefindings of localized up-regulation (shown, for example, by increasedimmunoreactivity and relative protein expression levels by western, andconfocal microscopy) as well as the abundance of PKHDL1 cDNAs inendometrial cancer demonstrates there is a connection between this geneand endometrial carcinogenesis and other cancers.

Fibrocystin-L also is expressed in both human and mouse T-lymphocytes(T-cells) and is up-regulated following human and mouse T-cellactivation. Activated T-cells refers to cells that have been recentlystimulated, accumulate at sites of ongoing immune activity, and areusually depleted once a disease process has been eliminated. Inautoimmune diseases or rejection of a transplant, inappropriateaccumulation and persistence of activated T-cells can result in tissueinjury. Fibrocystin-L also is present in higher amounts in CD4⁻ (helper)T-cells than in CD8⁺ (killer) T-cells and present in higher amounts inmemory T-cells than in naïve helper T-cells. Naive cells have not beenpreviously activated, require a strong stimulus to undergo activation,typically require a period of days to become fully activated, and do notcirculate through most organs. Memory T-cells are long-lasting T-cellsthat have been activated previously, are capable of rigid re-activationupon receipt of a new stimulus (e.g., re-exposure to the same infectiousagent), and circulate through most organs and tissues. The generation ofmemory T-cells is essential for successful vaccination against manyinfections. As such, fibrocystin-L may play a role in the function ofmemory and activated T-cells. The genetic programs that are initiated byT-cell interactions with antigen-presenting cells (APCs) bearing cognateantigen regulate a host of specialized functions including T-cellproliferation, cytokine production, migration patterns, cytotoxicity,and cell survival. Coordination of these functions is essential forelimination of infection, immune surveillance against neoplasia, andgeneration of memory responses. Furthermore, aberrant T-cell activationunderlies the pathogenesis of autoimmunity and rejection of transplantedorgans and tissues. Fibrocystin-L may play a role in the regulation ofthe T-cell-APC interface structure or in adhesion and migrationpatterns.

Thus, detecting PKHDL1 nucleic acids or nucleic acid sequence variantsthereof, or fibrocystin-L polypeptides, can be useful for characterizingimmune responses in subjects, or for diagnosing, preventing, or treatingautoimmune disease, transplant rejection, infectious disease, or cancer(e.g., endometrial, breast, ovarian, lung, or colon cancer). Forexample, PKHDL1 mutation and polymorphism analysis can be used toidentify individuals at greater or lesser risk for, e.g.,immune-mediated diseases or cancer.

PKHDL1 nucleic acids or nucleic acid sequence variants thereof, orfibrocystin-L polypeptides, also can be detected as a marker ofendometrial cancer and other cancers, including breast, ovarian, lung,and colon cancer, or to grade endometrial or other cancers. For example,monoclonal antibody FibLA-2, FibLA-10.1, FibLA-10.2, FibLA-4.1,FibLA-4.2, FibLA-11.1, FibLA-11.2, FibLA-11.3, FibLA-13.1, or FibLA-13.2can be used to detect fibrocystin L in a biological sample such as atissue sample from the endometrium, breast, ovary, lung, or colon. Insome embodiments, endometrial or other cancers can be graded bydetecting the level of fibrocystin-L expression in tumor tissues using,for example, immunohistochemistry, western blotting, or an ELISA todetect the polypeptide or by analysis of mRNA levels. In otherembodiments, endometrial or other cancers can be graded by detecting thelevel of secreted fibrocystin-L or fragments thereof in blood serum,urine or other bodily fluid. Fibrocystin L can be detected incombination with another marker for a particular cancer (e.g., PTEN orp53) or in combination with determining the status of a hormone receptor(e.g., estrogen or progesterone receptor status).

Furthermore, fibrocystin-L expressing T-cells can be detected and/orquantified in the blood to measure, for example, immune responses tovaccination or the nature and severity of autoimmune disease ortransplant rejection. In general, the number of fibrocystin-L expressingT-cells in the blood will be increased after vaccination when comparedwith the baseline number of fibrocystin-L expressing T-cells in theblood before vaccination. Similarly, with respect to autoimmune diseaseor transplant rejection, the number of fibrocystin-L expressing T-cellswill be increased relative to a baseline number of fibrocystin-Lexpressing T-cells in a control population (e.g., control subjectswithout autoimmune disease or subjects that have not undergone atransplant). Fibrocystin-L expressing T-cells also can be detectedand/or quantified in tissue biopsy specimens to assess the nature andseverity of autoimmune disease, transplant rejection, or acancer-specific cellular immune response. Standard techniques can beused to detect and/or quantitate the number of fibrocystin-L expressingT-cells that are present in a biological sample (e.g., a blood or tissuesample).

8. Methods of Detecting Sequence Variants

Methods of the invention can be utilized to determine whether the PKHDL1gene of a subject contains a sequence variant or combination of sequencevariants. Furthermore, methods of the invention can be used to determinewhether both PKHDL1 alleles of a subject contain sequence variants(either the same sequence variant(s) on both alleles or separatesequence variants on each allele), or whether only a single allele of asubject contains sequence variants.

Sequence variants within a PKHDL1 nucleic acid can be detected by anumber of methods. Sequence variants can be detected by, for example,sequencing exons, introns, or untranslated sequences, denaturing highperformance liquid chromatography (DHPLC; Underhill et al. (1997) GenomeRes. 7:996-1005), allele-specific hybridization (Stoneking et al. (1991)Am. J. Hum. Genet. 48:370-382; and Prince et al. (2001) Genome Res.11(1):152-162), allele-specific restriction digests, mutation specificpolymerase chain reactions, single-stranded conformational polymorphismdetection (Schafer et al. (1998) Nat. Biotechnol. 15:33-39), infraredmatrix-assisted laser desorption/ionization mass spectrometry (WO99/57318), and combinations of such methods.

Genomic DNA generally is used in the analysis of PKHDL1 sequencevariants. Genomic DNA typically is extracted from a biological samplesuch as a peripheral blood sample, but can be extracted from otherbiological samples, including tissues (e.g., mucosal scrapings of thelining of the mouth or from renal or hepatic tissue). Routine methodscan be used to extract genomic DNA from a blood or tissue sample,including, for example, phenol extraction. Alternatively, genomic DNAcan be extracted with kits such as the QIAamp® Tissue Kit (Qiagen,Chatsworth, Calif.), the Wizard® Genomic DNA purification kit (Promega,Madison, Wis.), the Puregene DNA Isolation System (Gentra Systems, Inc.,Minneapolis, Minn.), and the A.S.A.P.™ Genomic DNA isolation kit(Boehringer Mannheim, Indianapolis, Ind.).

Typically, an amplification step is performed before proceeding with thedetection method. For example, exons or introns of the PKHDL1 gene canbe amplified and then directly sequenced. Dye primer sequencing can beused to increase the accuracy of detecting heterozygous samples.

PKHDL1 sequence variants can be detected by, for example, DHPLC analysisof PKHDL1 nucleic acid molecules. Genomic DNA can be isolated from asubject (e.g., a human, a mouse, or a rat), and sequences from one ormore regions of an ARPKD gene can be amplified (e.g., by PCR) usingspecific pairs of oligonucleotide primers (e.g., as described above insubsection 2). After amplification, PCR products can be denatured andreannealed, such that an allele containing a PKHDL1 sequence variant canreanneal with a wild-type allele to form a heteroduplex (i.e., adouble-stranded nucleic acid with a mismatch at one or more positions).The reannealed products then can be subjected to DHPLC, which detectsheteroduplexes based on their altered melting temperatures, as comparedto homoduplexes that do not contain mismatches. Samples containingheteroduplexes can be sequenced by standard methods to specificallyidentify the variant nucleotides. Examples of specific sequence variantsare provided in Table 5 below.

Allele specific hybridization also can be used to detect PKHDL1nucleotide sequence variants, including complete haplotypes of a mammal.In practice, samples of DNA or RNA from one or more mammals areamplified using pairs of primers, and the resulting amplificationproducts are immobilized on a substrate (e.g., in discrete regions).Hybridization conditions are selected such that a nucleic acid probewill specifically bind to the sequence of interest, e.g., the PKHDL1nucleic acid molecule containing a particular sequence variant. Suchhybridizations typically are performed under high stringency, as somenucleotide sequence variants include only a single nucleotidedifference. High stringency conditions can include the use of low ionicstrength solutions and high temperatures for washing. For example,nucleic acid molecules can be hybridized at 42° C. in 2×SSC (0.3MNaCl/0.03 M sodium citrate/0.1% sodium dodecyl sulfate (SDS)) and washedin 0.1×SSC (0.015M NaCl/0.0015 M sodium citrate), 0.1% SDS at 65° C.Hybridization conditions can be adjusted to account for unique featuresof the nucleic acid molecule, including length and sequence composition.Probes can be labeled (e.g., fluorescently) to facilitate detection. Insome embodiments, one of the primers used in the amplification reactionis biotinylated (e.g., 5′ end of reverse primer) and the resultingbiotinylated amplification product is immobilized on an avidin orstreptavidin coated substrate.

Allele-specific restriction digests can be performed in the followingmanner. For PKHDL1 nucleotide sequence variants that introduce arestriction site, restriction digest with the particular restrictionenzyme can differentiate the alleles. For PKHDL1 sequence variants thatdo not alter a common restriction site, mutagenic primers can bedesigned that introduce a restriction site when the variant allele ispresent or when the wild type allele is present. A portion of a PKHDL1nucleic acid can be amplified using the mutagenic primer and a wild typeprimer, followed by digestion with the appropriate restrictionendonuclease.

Certain sequence variants, such as insertions or deletions of one ormore nucleotides, change the size of the DNA fragment encompassing thevariant. The insertion or deletion of nucleotides can be assessed byamplifying the region encompassing the sequence variant and determiningthe size of the amplified products in comparison with size standards.For example, a region of a PKHDL1 nucleic acid can be amplified using aprimer set from either side of the sequence variant. One of the primersis typically labeled, for example, with a fluorescent moiety, tofacilitate sizing. The amplified products can be electrophoresed throughacrylamide gels with a set of size standards that are labeled with afluorescent moiety that differs from the primer.

Other methods also can be used to detect sequence variants. For example,conventional and field-inversion electrophoresis are known in the art tobe useful for visualizing basepair changes. Furthermore, Southernblotting and hybridization can be utilized to detect largerrearrangements such as deletions and insertions.

The association of certain sequence variants with susceptibility toAPRKD or a diagnosis of ARPKD can be determined. An ARPKD-associated (ordisease-associated) sequence variant is a sequence variant orcombination of sequence variants within the PKHDL1 gene of a subjectthat is correlated with the presence of ARPKD in that subject. Sequencevariants associated with the presence of ARPKD in a subject can include,for example, mutations that will result in truncation of a fibrocystin-Lpolypeptide or a substantial in-frame alteration within a PKHDL1transcript from the subject, missense or small in-frame mutations foundwithin a nucleic acid sample of a subject and not found at a significantlevel in the normal population, and mutations that segregate in ARPKDfamilies in a fashion known in the art to be consistent with autosomalrecessive inheritance. Other sequence variants may be identified thatare not individually disease-associated, but which may be associatedwith ARPKD when combined with one or more additional sequence variants.Still other sequence variants can be identified that are simplypolymorphisms within the normal population, and which are not associatedwith ARPKD.

9. Articles of Manufacture

PKHDL1 nucleic acid molecules (e.g., oligonucleotide primer pairs andprobes) of the invention can be combined with packaging material andsold as kits for determining if a subject has altered cellular immunity,if a subject is susceptible to developing ARPKD, diagnosing a patientwith ARPKD, for detecting endometrial cancer, based on the detection ofPKHDL1 gene expression or sequence variants within the PKHDL1 gene ofthe subject. Components and methods for producing articles ofmanufacture such as kits are well known. An article of manufacture mayinclude one pair of PKHDL1 oligonucleotide primers or a plurality ofoligonucleotide primer pairs (e.g., 2, 3, 4, 10, or more than 10 primerpairs). In addition, the article of manufacture may include buffers orother solutions, or any other components necessary to assess whether thePKHDL1 gene of a subject contains one or more variants. Instructionsdescribing how the PKHDL1 primer pairs are useful for detecting sequencevariants within a PKHDL1 gene also can be included in such kits.

In other embodiments, articles of manufacture include populations ofisolated PKHDL1 nucleic acid molecules immobilized on a substrate.Suitable substrates provide a base for the immobilization of the nucleicacids, and in some embodiments, allow immobilization of nucleic acidsinto discrete regions. In embodiments in which the substrate includes aplurality of discrete regions, different populations of isolated nucleicacids can be immobilized in each discrete region. Thus, each discreteregion of the substrate can include a PKHDL1 nucleic acid moleculecontaining a different sequence variant (e.g., the sequence variants ofTable 5). Such articles of manufacture can include two or more nucleicacid molecules with different sequence variants, or can include nucleicacid molecules with all of the sequence variants known for PKHDL1.

Suitable substrates can be of any shape or form and can be constructedfrom, for example, glass, silicon, metal, plastic, cellulose or acomposite. For example, a suitable substrate can include a multiwellplate or membrane, a glass slide, a chip, or polystyrene or magneticbeads. Nucleic acid molecules or polypeptides can be synthesized insitu, immobilized directly on the substrate, or immobilized via alinker, including by covalent, ionic, or physical linkage. Linkers forimmobilizing nucleic acids and polypeptides, including reversible orcleavable linkers, are known in the art. See, for example, U.S. Pat. No.5,451,683 and WO98/20019. Immobilized nucleic acid molecules typicallyare about 20 nucleotides in length, but can vary from about 10nucleotides to about 1000 or more nucleotides in length.

In practice, a sample of DNA or RNA from a subject is amplified, theamplification product is hybridized to an article of manufacturecontaining populations of isolated nucleic acid molecules in discreteregions, and hybridization can be detected. Typically, the amplifiedproduct is labeled to facilitate detection of hybridization. See, forexample, Hacia et al. (1996) Nature Genet., 14:441-447; and U.S. Pat.Nos. 5,770,722 and 5,733,729.

The invention will be further described in the following examples, whichdo not limit the scope of the invention described in the claims.

Examples Materials and Methods

Preparation of Resting and Stimulated Immune Cell Sub-Populations

Spleens, thymuses and subcutaneous lymph nodes were dissected from B6and BALB/C mice under sterile conditions. Cell suspensions were preparedby disruption of the organs in DMEM/10% FCS and passage through 45 μmnylon mesh. For spleen and thymus suspensions, erythrocytes were lysedby 5 min incubation in ACK buffer (0.15 M NH₄Cl, 1.0 mM KHCO₃, 0.1 mMNa₂EDTA, pH 7.2). For flow cytometric sorting, surface staining with apanel of fluorochrome-labeled monoclonal antibodies (BD Pharmingen, SanDiego, Calif.) was carried out by incubating the cells in DMEM/10% FCSat 4° C. for 1 hour. Labeled cell suspensions were washed andre-suspended in DMEM/10% FCS at 4 to 8×10⁶ cells/ml and flow sortedusing a FACS Vantage sorter (Becton Dickinson Immunocytometry Systems,San Jose, Calif.). The antibody combinations were as follows: SplenicT-cell sub-populations: anti-mouse CD4-FITC (RM4-4), and anti-mouseCD8α-PE (53-6.7). Sorted populations: CD4^(+ve)/CD8^(−ve) (CD4+ T-cells)and CD4^(−ve)/CD8^(+ve) (CD8+ T-cells). Splenic dendritic cellsub-populations: anti-mouse CD11c-FITC (HL3) and anti-mouse CD8α-PE.Sorted populations: CD11c^(−ve)/CD8^(−ve) (myeloid) andCD11c^(+ve)/CD8^(+ve) (lymphoid). Thymocyte sub-populations: anti-mouseCD4-FITC, anti-mouse CD8α-PE. Sorted populations: CD4^(−ve)/CD8^(−ve),CD4^(+ve)/CD8^(+ve), CD4^(+ve)/CD^(−ve), and CD4^(−ve)/CD8^(+ve).Splenic NK and NKT cells: anti-mouse CD3ε-FITC (145-2C11) and anti-NK1.1-PE (PK136). Sorted populations: NK1.1^(+ve)/CD3ε^(−ve) (NK-cells)and NK1.1^(+ve)/CD3ε^(+ve) (NKT-cells). Splenic B-cell sub-populations:anti-mouse IgD-FITC (11-26c.2a) and anti-mouse CD19-PE (1D3). Sortedpopulations: CD19^(+ve)/IgD^(+ve) (naïeve B-cells) andCD19^(−ve)/IgD^(−ve) (memory B-cells). An aliquot of the memory B-cellswas stimulated for 96 hours (activated memory B-cells) with plate-boundgoat-anti-mouse IgG (ICN Biomedicals Inc., Aurora, Ohio), 25 ng/mllipopolysaccharide (LPS, Sigma Aldrich, St. Louis, Mo.), and 2.5 μg/mlpurified anti-mouse CD40 (HM40-3, BD Pharmingen). Murine CD4^(+ve) andCD8^(+ve) lymph node T-cells were purified by nylon wool column andcomplement-mediated depletion then activated for 72 hours in tissueculture plates coated with a combination of hamster anti-mouse CDRε(145-2C11) and hamster anti-mouse CD28 (PV-1), as described by GriffinM. D., et al. (2000), J Immunol., 164, 4433-42; or stimulated for 96hours by co-culture with irradiated allogeneic (B6) bone marrow-deriveddendritic cells (allo-stimulated T-cells).

Murine peritoneal inflammatory macrophages were generated from B6 miceby intraperitoneal injection (1 ml/animal) of sterile 3% thioglycollate(Becton Dickinson Microbiology Systems, Cockeysville, Md.). After 7days, the animals were sacrificed and cells extracted by peritoneallavage using sterile DMEM/10% FCS, washed and an aliquot retained forRNA preparation (fresh peritoneal inflammatory cells). The remainder ofthe cells were re-suspended in DMEM/10% FCS and allowed to adhere totissue culture flasks at 37° C. for 2 hours. Non-adherent cells wereremoved by washing with sterile PBS and individual cell layers wereexposed for 1 hour at 37° C. to PBS alone (unstimulated macrophages) orto PBS containing LPS 2 μg/ml (stimulated macrophages). These solutionsthen were removed, the cell layers washed with PBS, and culture mediumre-applied for 24 hours.

Human lymphocytes were isolated from whole blood using 54% Percott(American Biosciences, Piscataway, N.J.) and the resulting PBS washedand pelleted cells were incubated in RPMI/10% FCS with concanavalin A(10 μg/ml) for 72 hours at 37° C.

RNA Analysis

RNA was isolated from snap frozen human tissues (adrenal, breast, colon,heart, kidney, liver, lung, pancreas, placenta and uterus, obtained assurgical waste), mouse tissues, cell-lines and the leukocyte populationsdescribed above (see FIG. 1) using the Trizol method (Invitrogen,Carlsbad, Calif.) or with the NucleoSpin (BD Biosciences, San Jose,Calif.) column system. Isolated RNA (1-5 μg) was used to make cDNA withthe Clontech Powerscript™ Reverse Transcriptase cDNA Synthesis Kit (BDBiosystems, San Jose, Calif.) and 250 μg random primers (Invitrogen,Carlsbad, Calif.). PKHDL1 (Pkhdl1) expression was analyzed by RT-PCRusing 50 ng cDNA and equalized by amplification of the control β-actin(mouse: F5′-CTGGCACCACACCTTCTACAATGAGCTG-3′ (SEQ ID NO: 31):R-5′-GCACAGCTTCTCTTTGATGTCACGCACGATTTC-3′) (SEQ ID NO: 32)) or GAPDH(human: F5′-GACCACAGTCCATGCCATCACT-3′(SEQ ID NO: 33):R5′-TCCACCACCCTGTTGCTGTA-3′(SEQ ID NO: 34); 453 bp product). Foranalysis of murine Pkhdl1, a 258 bp region from exons 76-77 (12429-12686bp; F5′-TCCATTTAGCACCTGTTGGGC-3′(SEQ ID NO: 35);R5′-AGTCTTCCTACAAGGCACGCTG-3′(SEQ ID NO: 36)) was amplified and forhuman PKHDL1, a 247 bp region from exons 35-36 (4232-4478 bp;F5′-CACCAGTCCTAATGTGTCTGTGG-3′(SEQ ID NO: 37);R5′-TGGAGAAAAATGGAGTGAGCCTC-3′(SEQ ID NO: 38)) was assayed. PCRconditions were as follows: 0.125 U AmpliTaq Gold (Applied Biosystems,Foster City, Calif.) in the supplied buffer, 2.0 mM MgCl₂, 0.2 mM eachnucleotide, 4 pM each primer and 50 ng cDNA, and PCR conditions of: 4min at 94° C.; 60 s at 94° C., 30 s at 56-64° C. and 30-60 s at 72° C.for 30-40 cycles; and finally 10 min at 72° C. The products wereelectrophoresed through 2.0% agarose gels and visualized by ethidiumbromide staining Multi-Tissue Northern Blots (BD Biosciences, San Jose,Calif.) were hybridized and washed using standard methods with theprobes: human PKHDL1, Ex 1-12, 979 bp (9-986 nt) and Ex 38-41, 1248 bp(5065-6312 nt) and murine Pkhdl1, ex32-38, 1048 bp (3788-4835 nt).

Cloning PKHDL1

The positions of likely human PKHDL1 exons were determined by comparisonof genomic DNA (ACO21001) to: the mouse D86 cDNA sequence, PKHD1, GenomeScan putative genes (Hs 8205 30 21 2; Hs 8205 30 23 1 and Hs 8205 30 232) and using the NIX suite of programs (world wide web athgmp.mrc.ac.uk). The transcript was amplified as 16 overlappingfragments with primers positioned in the most strongly predicted exons(see Table 1) using PCR conditions as described above and human lung oradrenal cDNA. The mouse ORF from the cDNA D86 to the 3′UTR has beencloned as 14 cDNA clones (Table 2). Fragments were cloned into specificrestriction site of pZERO using amplification primers with matchingsites, or unmodified product was cloned into terminal transferase (NewEngland Biolabs, Beverly, Mass.) treated, EcoRV cut vector using theRapid DNA Ligation Kit (Roche Applied Science, Indianapolis, Ind.) andgrown in E. coli XL-2MRF′ (Stratagene, LaJolla, Calif.). The 5′ and 3′regions were amplified and cloned using RACE strategies with the SMARTRACE cDNA Amplification Kit (BD Biosciences, San Jose, Calif.). For the5′ RACE, human adrenal RNA was reverse transcribed with Powerscript RTand amplified with the 5′ RACE CDS and SMART-II primers using touchdownPCR and nested gene specific primers. At the 3′ end, cDNA synthesis waswith the tailed and anchored oligo (dT) primer, 3′-CDS, and amplified asabove with nested gene specific primers. Products were sequenced usingthe Big-Dye Terminator Kit (Applied Biosystems, Foster City, Calif.) andanalyzed on ABI377 Sequencers. The sequence was assembled into a contigusing the Sequencer 4.1.2 program.

TABLE 1 Details of PKHDL1 cDNA clones Position in Clone Size *bp) Exonscoding region Comments MH1 338 1-3 −104:234   5′ RACE MH2 978  1-12 9:986 MH3 1291 11-20  852:2143 MH4 568 20-23 2086:2653 MH5 1097 23-312588:3684 MH6 934 29-36 3488:4421 MH7 925 35-38 4217:5141 MH8 1249 38-415065:6312 MH9 1233 40-48 6046:7278 MH10 1185 47-49 7124:8308 MH11 70249-51 7980:8681 MH12 1158 50-59 8586:9743 MH13 705 58-63  9618:10322MH14 1311 61-70 10051:11362 MH15 1038 70-75 11256:12293 MH16 961 73-7812017:3′ 339 3′ RACE

TABLE 2 Details of Pkhdl1 cDNA clones Position in Clone Size (bp) Exonscoding region (nt) Comments D86 5773  1-38   1:5773 Alternative 3′UTR 62bp into IVS 38 MS1 563 38-41 5618:6180 MS2 690 39-43 5921:6610 MS3 60642-47 6495:7100 MS4 640 46-49 6903:7542 MS5 1005 48-49 7343:8347 MS6 60949-52 8244:8852 MS7 846 51-57 8698:9543 MS8 683 56-62  9452:10134 MS9673 61-66 10010:10682 MS10 583 66-69 10603:11185 MS11 656 6-9-7311114:11769 MS12 302 71-73 11509:11810 MS13 943 73-77 11771:12713 MS14284 7778 12423:3′UTR, 56

Mutation Analysis of PKHDL1

All patients in the study gave informed consent and the project wasapproved by the Institutional Review Board at Mayo Clinic. Fivepreviously described ARPKD patients from families: M36, P244, M51, M52and M55 (Ward C. J., et al. (2002), Nature Genet., 30, 259-269), plustwo additional typical ARPKD patients, in which no PKHD1 mutation wasidentified (n=4) or where only a single missense change was detected(M52, M52 and M55), were screened for PKHDL1 mutations. An additionalthirteen typical ARPKD patients with no detected PKHD1 mutation werescreened for changes in thirty-four PKHDL1 exons, but as no likelydisease causing mutations were identified, the screening of the gene wasnot completed.

To screen for PKHDL1 mutations, all the 78 coding exons were amplifiedfrom genomic DNA as 85 fragments of 150-350 bp. Primers were typicallypositioned in the intron ˜20 bp from the intron/exon boundary. See Table3 for the sequence of each of the primers. The fragments were amplifiedusing the following protocol: genomic DNA (60 ng), primers (8 pmoleach), dNTPs (200 μM each), MgCl₂ (2.5 mM) and 1 U of Taq Gold in thesupplied buffer (Applied Biosystems, Foster City, Calif.), in a totalvolume of 25 μl. The PCR program included: 120 s, 94° C., 35-40 cyclesof 30 s at 94° C., 30 s at 50°-63° C. and 30 s at 72° C.; and 10 min 72°C. The PCR products were treated to form heteroduplexes and analyzed forbase-pair changes using DHPLC on the WAVE Fragment Analysis System(Transgenomic, Omaha, Nebr.), as previously described (Ward C. J., etal. (2002) supra). See also Table 3 for DHPLC conditions. Briefly, crudePCR (300-500 ng) was injected into the chromatographic column (DNASepCartridge, Transgenomic, Omaha, Nebr.) and eluted using calculated andempirically determined optimal conditions. Samples showing an abnormalchromatogram were further characterized by direct sequencing aspreviously described (Ward C. J., et al. (2002) supra). Potentiallypathogenic changes were validated by DHPLC analysis of 25-100 normalcontrols (50-200 chromosomes).

Table 3 Details of PCR Amplicons and DHPLC Conditions to Analyze thePKHDL1 Gene

TABLE 3Details of PCR amplicons and DHPLC conditions to analyze the PKHDL1 geneDHPLC condition Exon Size temp Forward Primer SEQ Reverse Primer SEQTemp Initial % frag. (bp) (° C.) Sequences (5′-3′) ID Sequences (5′-3′)ID (C.°) buffer B 1 152 57 GCACCAACTCCGCAGAAC 39TGTCTACGCGGGCCTCTCCTGCTTG 40 63 47 2 302 48 GGCAGAGCCAAAAATAAAACCTG 41ATAGGCTTGAAAATACCTCAACC 42 51 54 3 242 43 CTCTGAAGATAGAATACC 43TCTGATATGATGAAAATG 44 53 52 4 311 48 GAATGGAACTTAACTAGACATCAG 45GCAGGAAAGAAAGCAAGATCAAC 46 56 55 5 159 43 AACTAGAAACAAAACAGAG 47ATTATTTACCATGAAACC 48 53 48 6 217 42 CTTACACAGAATCTTTTTG 49TTTAATATCATTGGACCC 50 53 51 7 191 44 TTTAAGGGAACCAGTGAG 51ATCTGCTATTTGTTTTTG 52 54 50 8 191 45 TGAGTAGTTTTTAAGAGAAAC 53TCTGTGAGCATTATGAAG 54 54 50 9 198 43 ATTTTGTACTTTTTCTCTG 55TATTTACCCTGTTGAATC 56 53 50 10 193 45 CTGAAGTTAAAAAAATGTTC 57TTATACACTGCCTTCCCACCACCC 58 53, 55 51 11 223 46 AAAAATCAGGTATTTGGG 59AGAATTTGTGGTAATAAAG 60 52, 56 51 12 198 46 GATACACTGATGTGATTTG 61TGTGAATATAGAAATGGC 62 56 50 13 345 45 TTATGATCCTGATGAAAG 63ACAGTATCAAATAGTATTCTG 64 57 55 14 179 44 ATAAATGTTGAAAAGGTC 65TAATGACAGGAAAAAGCC 66 54 49 15 259 49 CTGGAAAAAAGTTATATTCATTAG 67ATTGAGATCCTGCCTTCG 68 54, 57 53 16 249 43 TTGAATAGCTGAATTATG 69ACAGAAACAGTATCTCCC 70 51 52 17 251 46 GAAAGATTTCAACTTTTC 71GCACAGGATATACATTTG 72 56 53 18 243 46 AGTTAATGTCTACAAATTC 73AGAGGGTGCAGGGAAAATGC 74 56 52 19 204 44 CGAGTGTTCTAACTTTTC 75AGGTCAGTTTCACAGTTC 76 52, 54 50 20 243 45 TTGGGGGAAAACCAGAAC 77AGATTCAATTAGCATATC 78 55 52 21 336 45 TCTCTGTGTTCTGGTACG 79CCCAAGAAATGGATTTGTCTTATC 80 52 55 22 326 46 GCTTTCTAAAGTGTATTTGC 81TGTTTCTATCCATACTGC 82 52, 55 55 23 343 46 TTACATGGCAAAAACCAC 83TAGTTAGCTGTCTTTTCC 84 52 55 24 300 46 ACATGAGGCTCATTTATG 85TGTGTGTGCGTATACACC 86 54, 55 54 25 276 51 AAGATTACAGGCGTGAAC 87GCACATAGAAGAAAAGAG 88 62 53 26 297 44 GACTTTTATTCACCTTTG 89GTCTTTAACATATTACAAAC 90 54 54 27 221 47 CTTTTGTTAAAACCTATTC 91CTTTCACACCCAGTATAG 92 57 51 28 244 49 TGTCTGATTTCATAACAACAGG 93CCTTTGATTCCACTTTATCTTAGAG 94 55 50 29 226 49 CATCTTTTTCTTTTTTTCAC 95CATGCAATTTTCTCTCTG 96 58 52 30 204 44 TATAGTTGAACTGTTTTG 97AGGAGGAAAAAGTGACTG 98 55 50 31 215 44 GCATTTCTGTATCTCAAC 99TGACCAATCTTATTGAAG 100 54 51 32 322 47 AAGATGAGAGATGAATTG 101AACTCCATCAAGTTTATG 102 54 55 33 212 45 GAAGCTCATTGAAAAATC 103GATAATCACTTTCCTATG 104 55 51 34 204 45 ATTTGACAAAATGTTTGC 105TCAGGTTTCAGTGCTTCC 106 55 50 35 313 47 ATTGCCATGTTGTCAAAG 107CATTTAGGAAAAAGTGAC 108 52, 57 55 36 250 46 GTACAATCTCATTTTATG 109TATCACATACACCCTGGG 110 57 53 37 325 45 AAACAGTTATCATTTTGG 111CATATAATAGAAGTACAAAG 112 56 55 38a 349 50 ACTGGAGGTATGTATTGACTTG 113TGACCCATAAGGACTTTTACAC 114 56 55 38b 347 50 AAAAGGCTCTGGATTTGC 115CATTAACCTCTATTGCTCTGAAC 116 58 55 38c 334 50 GGTTTGGGGACTGTTTTG 117AGTAGACTTCATTGGGGTTG 118 58 55 38d 350 50 GGAAATGGCTTCTATCCAG 119AGGTATTAAGTGTAAGTGGGAAC 120 52, 56 56 39 472 48 AGACTGTAGGGTATATTGTAGTC121 GAAACAAAATATCTGCAGGTTC 122 52 55 40 350 47 ATCAAAAGAGATTCAGTTGC 123CTGCCATTACTTTTTCTGAC 124 52 55 41 281 55 GGAGGTTTTGGAAATGAATCAG 125TGGAAATGCACAATGATGCGTG 126 60 54 42 275 52 AAAGGGTTTGACAGTGTGATCTAG 127ATGCTGGTTTTCTATTGCTGTG 128 56 53 43 253 52 CGAATGAAAAACTCTGGTAAAATCC 129TCAGGCAGAGTCCAATGAACAG 130 55 53 44 268 46 TGTAATGAATAATTTAATAGGTAAC 131AAGATAAACTTAGGAGAGGTTG 132 53 53 45 223 54 TGGATTTGGGGTTTTAATTTTC 133GAGTCTTCCTCTACCAACTCCC 134 60 51 46 231 49 AGTTCTCAATAACAAATCAAAC 135CTTTTCTAAATACACATCATTAAG 136 57 52 47 344 48 TACCAAAACAATATGTTATGTC 137GCATGATTATACCAACCACGAG 138 56 55 48 241 50 TCTTCAATATAAGAGGATTCCG 139TAACCTTGAGCAAACCACTGTG 140 55 52 49a 304 50 CTAAATAACTGTGATTTCTGGG 141GAAGACTGGTACTTTGCTGTAC 142 56 54 49b 303 53 CCAGTATAACTTGGCAGTATTTG 143GTACAAGATCCCGTTTGCATGG 144 58 54 49c 333 53 ATTTCCCCATGCAAACGGGATC 145CAGAAGAGACAGTCAAGCCTTC 146 59 55 49d 300 41 GGTTCTCCCATTTAGTGAAGGC 147CAATTCAATTCTGTGCTAACAC 148 53, 58 54 50 314 50 CCTTTTTTATGTTTCTTAATGTG149 ATGATGACAAAAGTTTAGGAAG 150 52, 58 55 51 269 46 GGAGGAGTTTATTAGAGG151 ATGTAGGCTGTGTTTGGG 152 54 53 52 299 47 TAAATCTTAACATAATATAGGGG 153TTAGATAAACTATCATTTCTGCC 154 52, 53 54 53 254 50 TTTGGTCACTATGTTCATTTAAC155 AGATATTGAAGGGTATCAACTAC 156 57 51 54 240 48 CATTTTTTTTCTTCTCTACCATG157 ACATTTCATTCATTTGTGTTTAC 158 55 52 55 281 47 AAGTTGTAGTTTATGGATTATG159 TGCTTCTTTCTTATTATTTGAG 160 51, 56 54 56 272 49GGGTGGATTTTTTTTCCTGGTC 161 AACTGATATGTACTTTAGTGCC 162 55 53 57 227 48TTTATACTAGCACCTAACTCAG 163 CCACTGTGTATATTCATTTTCC 164 57 52 58 300 47CATAATTGCCAATGAGATATAC 165 GTAAATGTGAATCTTTCAACAC 166 53 54 59 284 49CTTCTCAGCATTGGCAATAATC 167 GAGCTGACTACATATAGATGAG 168 55 54 60 250 47CAAAAATGTTTTATTCCAACTG 169 AAGATGTGGCTATTTAGAAGTC 170 52 53 61 283 47TGAGTATTGATTATTGATAAAGG 171 CCACAGGATGTGTAATTTGAACC 172 52 54 62 270 48GCAAATTGACTTATGTTTTTTGGGG 173 CATTCACTCCTTTAGTTAGCTC 174 52 53 63 208 48GTGTATTGTCATATACTTACTCTCG 175 CTAGTTTTAGCGATTCCTGG 176 55 51 64 243 47TTCTCTGGTTCTATATTTCC 177 CAGGTTACATAATACTAAGGAC 178 54 52 65 305 51TTTGGACATGCTGGGATTATGG 179 TTCAGAAATTCCACCCTTCTCC 180 55 54 66 300 51CCCATGTTTTCTTTTAGTAAGAGC 181 ATGAGCTGAAGCAAAGGTAGGC 182 56 54 67 301 48TGAACTCACTGCTGCTCATCGG 183 TATCCTCTACATATTCTTTACAG 184 56 55 68 302 48GGCAGAATGTGCATTAAATCTG 185 GGAGGAAGTGAGAATGAAAAAC 186 52 54 69 327 51CAAGTGTATTCATATTGCTCTCTAG 187 GCCTAATGACAGATTAAGCAAG 188 57 55 70 333 48CTAGCATAACAAGAAATAGATG 189 AATTTATGAGATGGCTTCATGC 190 53 55 71 301 53GGAGTATGCACTTTCATTTTGC 191 ATGAGCTGTAAGGCTGACAATG 192 58 54 72 258 47ATATTGAAGGACGGTTTAAGTG 193 TAAGTACATTTTCCATGTGTAC 194 54 53 73 409 52CTGTGATGTTCTGGCTTTTTTC 195 ATTGCATTCCTCCATCTCAAAC 196 55 57 74 327 48AGAATGCTAAAGTGAAAAACTC 197 GTTTTGAAATAGAAACAGAGAG 198 54 55 75 276 52CTGCTGAGTGTAGTTTATCATG 199 GAGTGAAACTGGCTCATCCTTC 200 56, 58 53 76 26748 TTTAAAAGCATGGAAACAGGAC 201 TATAATTGTCTCTATTTATGGC 202 54, 55 53 77367 51 AGGAAATCAAACACTATGATGC 203 GATATCATGCACAAGAGCTGTG 204 56 56 78342 46 TATGCTATTTCTACTTAAAAATTG 205 TTTGTTGGTACAATAACTTAGAGG 206 52 55

Sequence Analysis

The intron/exon structure of PKHDL1 was determined by comparison withgenomic sequence using MacVector 7.0 and SIM4(pbil.univ-lyon1.fr/sim4.html). The sequence of the murine Pkhdl1transcript was determined by comparison of human PKHDL1 sequence and thePkhdl1 cDNA clone, D86, to mouse genomic sequence using MacVector. Thegenomic sequence was used as the authentic sequence for the human andmurine transcripts and the numbering of the transcript was from thestart codon.

BLAST was used to screen for homologous sequences in the GenBankdatabase. Comparison between orthologs, and fibrocystin andfibrocystin-L, was made by BLAST2 and the Pustell protein/DNA sequencealignment tool of MacVector. Protein domains were defined using the Pfamdatabase (pfam.wustl.edu). To analyze protein topology the programsSOSUI (sosui.proteome.bio.tuat.ac.jp/sosuiframe0.html); TMHMM (v2.0)(cbs.dtn.dk/services/TMHMM-2.0) and SignalP (v2.0)(cbs.dtn.dk/services/SignalP/) were used. Potential N-glycosylationsites and phosphorylation sites were identified with MacVector andalignments made with the ClustalW (v1.4) program, within MacVector.

GenBank Accession Numbers

Fibrocystin (human) AAL74290; (mouse) AAN05018; D86 (mouse),NP_(—)619615; DKF2p586C1021, XP_(—)488444; PKHDL1 cDNA clone ADBBEB10(5′), AV706327; PKHDL1 genomic sequence (human) RP11-419L20, AC02001;(mouse) NW_(—)000106 (44,650K-44,800K). Fibrocystin-L related proteins:HGFR (mouse), NP_(—)032617; plexin 1 (mouse), NP_(—)032901; TMEM2(human) NP_(—)037522; XP_(—)051860 (human) XP051860; hypotheticalprotein from C. aurantiacus, ZP_(—)00018581 and hypothetical proteinfrom T. Tengeongensis, NP_(—)621862. GenomeScan and other predictedproteins similar to fibrocystin-L: Hs8 8205 30 32 2; Hs 8205 30231; Hs8205 30 232 (human); LOC271264 (mouse) XP_(—)194970. Fugu PKHDL1,genomic sequence Scaffold 2621, CAAB01002621. PKHDL1 and Pkhdl1 cDNAsequences, AY219181 and AY219182, respectively.

Example 1 Identification and Cloning of PKHDL1 and Pkhdl1

To identify the human ortholog of D86, the 1945 aa protein sequence wasanalyzed against the human genomic sequence by BLAST. The likely humanortholog was identified by a strong hit in genomic sequence fromchromosome region 8q23 in the BAC clone RP11-419L20. Comparison of thegenomic sequence to full-length fibrocystin using Pustell protein/DNAsequence alignment and BLAST showed that the homology on chromosome 8extended over most of the length of the disease related protein,covering at least 150 kb of genomic DNA. This region not only containedhomology to D86 but also matched the previously described cDNA,DKFZp586C1021, that is similar to the 3′ region of PKHD1, indicatingthat these cDNAs are part of the same large gene.

To clone the full-length human D86 ortholog, (the PKHD-like 1 gene,PKHDL1), a RT-PCR exon linking approach was used with primers located inexons strongly predicted by GenomeScan and NIX analysis, and by homologywith fibrocystin. The full-length transcript was cloned as 16overlapping fragments (see Table 1 for details) and the 5′ and 3′ endsof the mRNA identified and cloned by RACE strategies (as describedabove). RNA from human lung and adrenal was used for the RT-PCR and allproducts were cloned and sequenced. There was some evidence ofalternative splicing, but sequence from the largest amplified fragmentin each case was assembled into a contig containing an ORF of 12,729 bp.PKHDL1 has a 5′ untranslated region (UTR) of 104 bp and the putativestart codon is the first in-frame ATG in the sequence. The start codondoes not strongly match the Kozak consensus, but overall 5 of 13 sites,including +4 and −2, match the consensus. The 3′ UTR is 248 bp and has atypical polyadenylation signal preceding the site of polyA addition by21 bp. The total transcript is 13081 bp.

Comparison of the PKHDL1 transcript to the genomic sequence showed thatthe gene contains 78 exons (see Table 4 for details) and the totalgenomic size of the gene is 167,918 bp. Two splice donor sites (for IVS8 and 67) have the non-canonical GC sequence rather than the typical GT.As is often found in the ˜0.5% of splice donor sites that have a GC, therest of the donor sequence (at both exons) closely matches the splicesite consensus. The transcriptional start of PKHDL1 is associated with aCpG island.

Many PKHDL1 exons were identified by gene prediction programs andGenomeScan defined the human and murine gene as three and two differentgenes, respectively (see Methods for details). Of the 78 PKHDL1 exons,53 were predicted correctly, 3 exons had one different splice junction(one associated with the GC splice donor) and 22 exons were notpredicted. A further 6 exons were predicted that were not found in thefinal transcript. Therefore, although these prediction programs arehelpful to identify exons, RT-PCR and sequencing were required to definethe most likely gene sequence. In this case, the availability of the D86murine cDNA of Pkhdl1 (exons 1-38) and human cDNA DKFZp586C1021 (exons69-78) helped determine the structure of the gene.

TABLE 4 Intron/exon structure of human and murine PKHDL1 Coding MouseExon^(▴) region position^(□) Human IVS IVS Number size (nt) nt aa size(nt) size (nt) 1  177*  1-73  1-25 1893 1767 2  90  74-163 25-55 16733 6795 3 145 164-308 55-103  948  916 4 109 309-417 103-139 1498 ? 5  58418-475 140-159 1409 1994 6  94 476-569 159-190 2866 4111 7  54 570-623190-208  528  491 8  74 624-697 208-233  1299^(†)  980 9  43 698-740233-247 3920 4065 10  71 741-811 247-271 1541 1440 11 111 812-922271-308 2321 1926 12  90  923-1012 308-339 1527 2256 13 269 1013-1281338-427 1152  464 14  92 1282-1373 428-458 2965 1695 15 160 1374-1533458-511  281  377 16 136 1534-1669 512-557 1204  647 17 144 1670-1813557-605 1570  606 18 158 1814-1971 605-657 1658 1281 19 114 1972-2085658-695 2286 1661 20 150 2086-2235 696-745 1006  580 21 125 2236-2360746-787 5551 1080 22 164 2361-2524 787-842 1257  173 23 173 2525-2697842-899 4394 5891 24 148 2698-2845 900-949 1769 1093 25 155 2846-3000 949-1000 2183 1908 26 123 3001-3123 1001-1041  469 1128 27 1063124-3229 1042-1077 3086 1900 28 111 3230-3340 1077-1114 1973 1467 29165 3341-3505 1114-1169  983  869 30 122 3506-3627 1169-1209 1864 175331 133 3628-3760 1210-1254  440  764 32 196 (193) 3761-3956 1254-13191617 1516 33 143 3957-4099 1319-1367  422  602 34 105 4100-42041367-1402  627  639 35 189 4205-4393 1402-1465  750  739 36 1714394-4564 1465-1522  559  312 37 227 4565-4791 1522-1597  758  731 38985 4792-5776 1598-1926 2497 2813 39 249 5777-6025 1926-2009  946  63740 150 6026-6175 2009-2059 1487 1821 41 175 6176-6350 2059-2117  974 921 42 157 6351-6507 2117-2169  437  370 43 157 6508-6664 2170-22221292 1402 44  80 6665-6744 2222-2248  476  670 45 130 6745-68742249-2292 1409 1073 46 130 6875-7004 2292-2335 3203 2392 47 2427005-7246 2335-2416 1935 1750 48 137 7247-7383 2416-2461 2307  980 491030  7384-8413 2462-2805 1332 ? 50 192 8414-8605 2805-2869 8348 3611 51152 8605-8757 2869-2919 1238 1191 52 160 8758-8917 2920-2973  557  72753 172 8918-9089 2973-3030 2154  970 54  89 9090-9178 3033-3060  351 498 55 149 9179-9327 3060-3109 1293 1728 56 130 9328-9457 3110-31531424 1253 57 119 9458-9576 3153-3192 1938  728 58 130 9577-97063193-3236 1474 1504 59 174 9707-9880 3236-3294 3130 1839 60 1049881-9984 3294-3328  916 1864 61 130  9985-10114 3329-3372  771 1099 62122 10115-10236 3372-3412 1666  375 63  91 10237-10327 3413-3443 31673522 64 149 10328-10476 3443-3492  82  83 65 123 10477-10599 3493-35331189  480 66 112 10600-10711 3534-3571  81  80 67 117 10712-108283571-3610  5555^(†)  3776^(†) 68 166 10829-10994 3610-3665 3170 3067 69233 10995-11227 3665-3743  201  190 70 168 11228-11395 3743-3799 25122503 71 158 11396-11553 3799-3851 4235 1416 72 136 11554-11689 3852-38972861 2838 73 342 11690-12031 3897-4011 3677 2164 74 152 12032-121834011-4061  406 2434 75 147 12184-12330 4062-411-  342  283 76 15412331-12484  411-4162 3397 1837 77 237 (258) 12485-12721 4162-4241 30594026 78 254^(•) (209)^(▪) 12722-12729 4241-4243 ^(▴)mouse size inbrackets if different from human; *human 5′ UTR = 104; mouse 5′ UTR notdetermined; ^(†)atypical GC splice donor; ^(□)in human and mouse to exon32. Exon 32-77 mouse position 1 codon less. Exon 77-78 mouse position 6codons more; ^(•)human 3′ UTR = 248; ^(▪)mouse 3′UTR = 201

To confirm the structure of PKHDL1 and determine if the D86 cDNAcontained the entire mouse ortholog, the sequence of the mousetranscript was determined. The human transcript was compared to murinegenomic sequence by BLAST and the NW000106 Mouse Supercontig wasidentified, indicating that Pkhdl1 is located in mouse chromosome region15B3. Strong similarity between PKHDL1 and the murine genomic sequence(and also using the D86 cDNA) enabled the full-length Pkhdl1 ORF of12,747 bp to be predicted. The mouse ORF from the D86 cDNA to the 3′UTRalso was cloned as 14 RT-PCR fragments (see Table 2), confirming thestructure of the gene. The intron/exon structure of Pkhdl1 is the sameas its human counterpart with 78 exons and all exon sizes, except 1, 32,77 and 78, the same in human in mouse (see Table 2 for details). Theatypical GC splice donor to exon 67 is conserved in the mouse, but theexon 8 donor is GT in this organism. The murine Pkhdl1 gene is alsoassociated with a CpG island.

Example 2 Tissue Expression of PKHDL1

Initially, PKHDL1 cDNAs of human and mouse were hybridized to multipletissue northern blots but no clear bands were visualized. Faint smearswere seen in many lanes (FIG. 1 a). The problem of visualizing PKHDL1 asa specific transcript by northern blotting appeared similar to that seenwith PKHD1. In the case of PKHDL1, the problem of resolving thetranscript as a discrete fragment was compounded by its low level ofexpression. Faint smears on the Northern blot may reflect particularsensitivity of this transcript to degradation or, as for PKHD1, thepresence of multiple alternatively spliced transcripts. Therefore,RT-PCR was used to examine the tissue expression of this gene.

Analysis of human adult material showed expression in most tissues.Analysis of a fuller range of tissues was possible in mouse and thisalso showed that expression levels were low, with the product found inmost tissues, both newborn and adult, after multiple cycle PCR (see FIG.1 b). PKHDL1/Pkhdl1 therefore appeared to be expressed at a low level inmost tissue types. The widespread low level of expression of PKHDL1 mayreflect the association of a CpG island with the promoter of this gene.CpG islands often are associated with more widely expressed genes.

There was evidence of possible alternative splicing of PKHDL1 as severalRT-PCR reactions generated more than one product. The GC splice donorfound at two exon junctions is also often associated with alternativesplicing. Furthermore, one human adrenal gland cDNA clone, ADBBEB10,that was fully sequenced, extends exon 18 a further 886 bp and leads toa novel 3′ UTR with an atypical, ATTAAA polyadenylation signal shortlybefore the site of polyA addition. D86, the originally describedtranscript of the 5′ part of Pkhdl1, has an extension of exon 38 intoIVS38, producing a 3′ UTR of 62 bp, although no clear polyadenylationsite is present in this sequence. It therefore seems likely that PKHDL1,like PKHD1, will generate many alternatively spliced transcripts, somepredicted to produce secreted proteins (such as ADBBEB10 and D86), aswell as the membrane bound form indicated by the longest ORF. There aresignificant regions of breakdown in homology between the two proteins(see FIG. 2 a). Alternative splice forms of the fibrocystin andfibrocystin-L proteins may match the other homolog better.

Example 3 Are Mutations to PKHDL1 Associated with ARPKD?

Previously, mutation analysis of PKHD1 revealed a population ofclinically well-characterized ARPKD patients in which no mutation wasidentified. The homology of PKHDL1 to PKHD1 and expression in kidney andliver suggested that it could be a candidate as a second ARPKD gene. Totest this hypothesis, ARPKD patients from seven families withoutdefinite PKHD1 mutations were screened for base-pair changes throughoutthe gene. The gene was partially screened in a further 13 PKHD1 mutationnegative ARPKD patients (see Methods for details). The 78 PKHDL1 codingexons and flanking intronic sequences were amplified from genomic DNAand analyzed for base-pair mismatches by denaturing high-performanceliquid chromatography (DHPLC; see Methods for details). This analysisrevealed 17 exonic changes (see Table 5 for details), including silentchanges at the amino acid level and conservative and non-conservativesubstitutions. No nonsense or deletion/insertion mutations were found.Seven non-conservative changes were screened in normal controls; five ofthese were found in that population (see Table 5), but two, Q/P702 andL/S 1199, were not detected. Analysis to determine whether these twosubstitutions segregated with the disease was uninformative. The L/S1199change, however, is probably not ARPKD associated as the family in whichthis change was detected (M52) also has the PKHD1 substitution, T36M.Although initially the significance of T36M was unclear, finding thischange in other ARPKD families showed that M52 is a typical (PKHD1mutated) ARPKD family. Q/P702 is conserved in the mouse ortholog, butnot fibrocystin (where it is aspartic acid), and the pathogenicsignificance of this change remains unclear.

In summary, analysis of PKHDL1 in a group of ARPKD patients withoutdetected PKHD1 mutations revealed a number of missense changes but noinactivating mutations (see Table 5). Although one non-conservativechange was found only in an ARPKD family, overall the data did notprovide compelling evidence associating this gene with ARPKD, even ifthis possibility cannot be entirely excluded. The lack of association ofPKHDL1 with ARPKD is consistent with the major sites of expression ofthis gene in blood cell lineages.

TABLE 5 Detected variants in PKHDL1 Amino Acid Position/ AlleleFrequency Allele Frequency Designation Nucleotide Change Exon ARPKDPopulation Normal Controls 1/V164 490A/G 6 3/14 1227G/A K409 13 3/141404C/T Y468 15 2/14 1920T/C N640 18 1/14 1965A/G E655 18 1/14 Q/P7022105A/C 20 1/14  0/200 T/A1192 3574A/G 30 4/14 13/100 L/S1199 3599T/C 301/14  0/100 G/V1223 3668G/T 31 1/14  1/200 R/S1514* 4540C/A 36 4/1417/100 V/I1607 4819/G/A 38 1/14 Y/C1638 4913A/G 38 3/14 12/100 6621C/GL2207 43 3/14 9084A/T T3028 53 3/14 H/Q3050 9150C/G 54 4/14 14/64 D/E3607 10821C/A 67 1/14 V/I4220* 12658G/A 77 1/14 *Polymorphic changedetected in cDNA sequence

Example 4 Cellular Expression of PKHDL1

To determine the cell types that express PKHDL1, tissue-specific celllines were analyzed by RT-PCR (FIG. 1 c). These showed the highest levelof expression in K562, an erythroleukemia cell line and stimulatedT-cells, with the only other expressing cells being EBV transformedlymphoblasts. Expression limited to blood-derived cells is consistentwith the database description of D86 as a lymphocyte-secreted protein.Furthermore, the tissue origin of described murine Pkhdl1 ESTs is inthymus (n=5) and lymph node (n=2), as well as adrenal (n=2).

Detection of the PKHDL1 transcript in organs that are composed entirelyof immune cell subtypes (spleen and thymus) as well as in activatedT-cells and B lymphoblasts suggested that expression of PKHDL1 may beimportant within cells of the immune system. To determine whether theexpression is confined to specific immune cell populations or to statesof immune activity, flow cytometric sorting from murine lymphoid organsand in vitro activation protocols were carried out (see Methods fordetails). RT-PCR analysis of RNA isolated from these cells resulted indetection of Pkhdl1 at low cycle number only in activated bulk T-cellsand purified CD4^(+ve) (helper) and CD8^(+ve) (cytotoxic) T-cells (FIG.1 d). Strong in vitro stimulation of B-cells and inflammatorymacrophages did not result in high-level expression. At high cyclenumber expression was also detectable in CD^(4+ve), CD^(8+ve) (doublepositive) thymocytes, resting naïve and memory B-cells, unstimulated andstimulated peritoneal macrophages, resting CD4^(−ve) and CD8^(+ve)T-cells, NKT-cells, and both CD8^(+ve) and CD8^(−ve) dendritic cells.

In summary, analysis of highly purified cell populations from murinelymphoid organs indicated that fibrocystin-L is up-regulated in T-cellsfollowing activation and, therefore, may serve a specific function incellular immunity. Increased expression of mRNA in purified helper(CD4^(+ve)) and cytotoxic (CD8^(+ve)) T-cells was detected followingactivating stimuli delivered by lectins, allogeneic antigen presentingcells (APCs), and immobilized antibodies to the T-cell receptor and theco-stimulatory receptor CD28. In contrast, strong activation stimulifailed to induce up-regulation of Pkhdl1 in memory phenotype B-cells andinflammatory macrophages.

Example 5 The Structure of Fibrocystin-L

Analysis of the longest ORF of the PKHDL1 sequence allowed the structureof the corresponding protein, termed fibrocystin-like (fibrocystin-L) tobe determined. Fibrocystin-L was predicted to be larger thanfibrocystin, with 4243 aa and a calculated unglycosylated molecular massof 466 kDa. A signal peptide was predicted at the N-terminal end withcleavage after the sequence CAA (see FIG. 2 a). Analysis of likelytransmembrane regions in fibrocystin-L gave conflicting results but themost likely structure (predicted by SOSUI) is of a single transmembranedomain, from 4213-4235 aa, leaving a short, 8 aa, cytoplasmic tail. Thepredicted topology was therefore similar to fibrocystin with a large,4212 aa, extracellular region and single transmembrane pass (FIG. 3).The extracellular region contains 56 putative N-linked glycosylationsites indicating that this region may be highly glycosylated. A singlepotential protein kinase C phophorylation site is found in theC-terminal tail at position 4239 aa.

The protein most similar to fibrocystin-L is fibrocystin, which ishomologous from the N-terminal end to 4185 aa (see FIG. 2 a). In thisregion, the two proteins show homology of 25.0% and similarity of 41.5%.There is no significant homology between the two proteins in thetransmembrane or short cytoplasmic regions. As with fibrocystin, themost clearly recognized protein domain in fibrocystin-L is the TIG/IPTdomain. Analysis by Pfam indicated that fibrocystin-L contains 14 copiesof this immunoglobulin-like fold, with three immediately after thesignal peptide and the remaining 11 in tandem from 1067 aa to 2177 aa,with a gap between 1470 aa-1566 aa; almost one-third of the protein isin TIG domains (FIGS. 2 a and 3). FIG. 2 b shows that all the TIGdomains closely match the TIG consensus and are similar to thecorresponding domains of fibrocystin, the HGFR and members of the plexinfamily. In all the fibrocystin-L TIG domains, apart from TIG10, thecysteine residues, which are important to stabilize the domain throughthe formation of a disulfide bond, are present.

A second region of significant homology is to a protein of unknownfunction, TMEM2, and two related proteins. This region of homology alsowas noted with fibrocystin. Two TMEM regions of homology with thefibrocystins have been defined: TMEM domain-A (2180 aa-2375 aa) and -B(3032 aa-3376 aa; FIGS. 2 a and c). The size of TMEM-B has been extendedfurther N-terminal compared to the area of homology that was previouslydescribed with fibrocystin. Interestingly, this homology is not onlywith the previously described proteins, TMEM2 and XP051860, but also toa newly described hypothetical protein from the thermorphilic,filamentous, photosynthetic bacteria, Chloroflexus aurantiacus. Indeed,the bacterial protein is more clearly related to the TMEM domains offibrocystin and fibrocystin-L than either of the other human proteins asit does not require to be gapped to match the sequence (FIG. 2 c).

In summary, the description of a second member of the fibrocystinprotein family has helped to refine the likely structure of theseproteins. A notable difference between fibrocystin-L and fibrocystin isthe length of the predicted cytoplasmic tail. In fibrocystin-L it isonly 8 aa, while in fibrocystin it contains 192 aa and has severalpossible PKA, PKC and casein kinase phosphorylation sites. Although theshort fibrocystin-L tail in humans has a single potential PKC site, thisis not conserved in the mouse. Fibrocystin-L is predicted to have 14 TIGdomains, far more than the seven predicted in fibrocystin by Pfam andother programs. Inspection of sequence alignments of the two proteinssuggests that fibrocystin may have further TIG-like domains C-terminalto those defined previously (see FIGS. 2 a and 3). A second importantregion of homology that is present twice in the fibrocystins is withTMEM2 and related proteins. TMEM2, XP051860 and the newly describedsequence in the filamentous bacteria C. aurantiacus, have a single copyof the TMEM repeat; in the first two proteins is it interrupted byaddition sequence (see FIG. 2 c). In the TMEM2, XP051860 and C.aurantiacus proteins, as in the fibrocystins, this region is predictedto be extracellular. As C. aurantiacus is the only sequenced prokaryotewith this protein domain, it appears likely that this may be an exampleof horizontal gene transfer.

Example 6 Fibrocystin-L Orthologs and Homologs

Murine fibrocystin-L is predicted to have 4249 aa with an overallidentity of 81.9% and similarity of 90.0% to the human protein. This ishigher than the corresponding figures of 72.6% and 83.1% for human andmurine fibrocystin. The murine fibrocystin-L is predicted to have asignal peptide with cleavage at the corresponding position to the humanprotein and a similarly located single transmembrane domain, leaving a 6residue C-terminal tail. Murine fibrocystin-L is also predicted to have14 TIG domains and similar TMEM homology. Fifty-six N-linkedglycosylation sites are predicted in the extracellular region of theprotein but the C-terminal tail does not contain a PKC site.

BLAST analysis for related proteins in other species where the completegenomic sequence is available showed a fibrocystin-L ortholog in thefish Takifugu rubripes. Strong similarity is seen with several predictedproteins and the corresponding genome sequence, Scaffold 2621. The FuguPKHDL1 ortholog is encoded by a genomic region of ˜30 kb. Interestingly,analysis of Fugu genomic sequence with fibrocystin only identified thePKHDL1 ortholog, but with a much lower score and E value than withfibrocystin-L. This indicates that Fugu has only PKHDL1 and no PKHD1ortholog. Analysis of available genomic sequence from other eukaryotesand prokaryotes revealed no clear full-length orthologs of PKHD1 orPKHDL1. However, other significant regions of homology were detected inthese species. The strongest homology was with the TMEM domain in C.aurantiacus as described above. The next most significant region waswith a conserved hypothetical protein from the bacteriumThermoanaerobacter tengeongensis that has 11 TIG domains, two from 246aa-332 aa and nine tandemly arranged from 580 aa. This protein also hasa fibronectin type III domain, and a signal peptide indicating that itis a secreted protein. Other high scoring homologies were with other TIGdomain proteins, most notably to plexin-like proteins, that typicallyhave four such domains.

Example 7 EST Analysis

Expression data for fibrocystin-L was examined in human and mouse ESTlibraries (Table 6) using the NCBI database on the World Wide Web atncbi.nlm.nih.gov/UniGene/. There was an overrepresentation of PKHDL1expression ( 16/58 clones; 27.5%) in human ESTs originating fromendometrial adenocarcinoma. Fifty percent ( 8/16) were fromwell-differentiated endometrial adenocarcinoma, 6/16 (37.5%) frommoderately differentiated adenocarcinoma, and 2/16 (12.5%) originatedfrom poorly differentiated tumors. A number of other epithelial cancersshow upregulated expression of fibrocystin-L. The mouse ESTs occurredmost commonly in thymus and pituitary gland (Rathke's pouch).

TABLE 6 Tissue distribution of ESTs for PKHDL1 in human and mouse inUnigene database. Human Number Mouse Number Uterus (adenocarcinoma) 16Thymus 5 Other 8 Pituitary - 4 Germinal B cell center; 7 Rathke's Pouchlymph node Embryo 3 Liver 4 Lymph node 2 Brain 3 Mixed 2 Pancreas 3Brain 1 Adrenal 2 Whole mouse 1 Muscle 2 One cell embryo 1 Mixed 2Urinary bladder 1 Pooled glandular 2 Spleen 1 Head and neck 1 Colontumor 1 Fetal 1 Pooled organ 1 Mixed embryo 1 Vascular 1 Adipose 1Larynx 1 Lung 1 Pituitary 1 Total Human ESTs 58 Total Mouse ESTs 21

RT-PCR in human and mouse tissue and cell lines demonstrated PKHDL1expression was seen in kidney, adrenal, brain, liver, lung, placenta,ovary, and tonsil. A limited number of human cancer tissues were alsoexamined. Upregulation of PKHDL1 was observed to be 1.7× in endometrialcancer compared with the levels seen in normal endometrium bysemi-quantitative RT-PCR using GAPDH as a control.

Example 8 Manufacture and Characterization of Fibrocystin-L Antibodies

A NusA (His)₆-tagged N-terminal human fibrocystin-L expressed proteinwas produced as follows and used to immunize mice. A fragment of PKHDL1was amplified and cloned from EST clone ADBBEB10 (human adrenal tissue)containing the 5′ sequence of PKHDL1 (containing the sequence for thethird TIG domain of fibrocystin-L (amino acids 288-656)) into pZERO,then subcloned into a modified pET-43a⁺ (Novagen) vector with a tobaccoetch virus (TEV) protease site between the NusA/His₆ ORF and the PKHDL1region. See FIGS. 8A and 8B. The (His)₆ NusA fibrocystin-L fusionprotein was produced in E. coli AD494 and DE3 strains; soluble proteinfractions were purified by metal affinity chromatography on an imidazolegradient followed by a round of size exclusion chromatography using aSuperdex 200 column. The calculated weight of 102.86 kDa including theNusA construct of the fusion protein was confirmed by running IPTGinduced bacterial cell lysates on an SDS-PAGE gel.

The PKHDL1 cDNA described above also was used to transfect PEAK cells(Edge Biosystems). Cells were transfected with the plasmid constructusing Lipofectamine® (Invitrogen) and stable transfectants were selectedusing puromycin selection (2 μg/mL). This construct contains aC-terminal Pk tag (a 14 amino acid sequence derived from the P and Vproteins of the paramyxovirus), Simian Virus 5 SV5, and an N-terminalFLAG tag after the signal peptide. The cDNA was prepared by utilizing anadaptor primer to obliterate a BamH1 site and generate a SAP1 site forsubcloning of the coding sequence of PKHDL1 with SAP1/Not1 ends into aPEAK plasmid (a modified pIRES PURO/PEAK 10 plasmid).

BALB/c ByJ mice were immunized with a subcutaneous injection of thepurified recombinant (His)₆NusA-tagged fibrocystin-L (uncleaved) protein(see FIG. 8C) emulsified in complete Freund's adjuvant (Difco) after aprotein concentration step by centrifugation and buffer exchange to PBS(Millipore; Amicon Ultra centrifugal filter). Twenty-eight days afterimmunization, mice were screened for their immune responsiveness to theantigen.

Anti-sera from one mouse gave a strong positive band of the protein asconfirmed by western blot and by overlay of tagged protein constructs.Confirmation of the identity of the prokaryotically expressed proteinwas confirmed by mass spectroscopy analysis of the expressed purifiedprotein, which was excised from an acrylamide gel. Spleens were removedfrom immune-responsive mice three days after an intravenous boostingwith antigen.

Single cell suspensions were prepared and red blood cells were removedby lysis with ammonium chloride potassium buffer (ACK). Lymphocytes andF/O myeloma cells (non-secreting myeloma derived from SP2/0 BALB/cmyeloma cells) were mixed in a ratio of 2:1 and centrifuged to form acell pellet. The cell pellet was resuspended in 1 mL of a 50% solutionof polyethylene glycol 1540 (Baker) and RPMI, then incubated at 37° C.for 90 seconds, washed and resuspended in fresh Iscove's ModifiedDulbecco's Medium (IMDM) with 10% fetal bovine serum (Hyclone). Aliquotsof 100 μl then were added to each well of five microtiter plates (Costar3595). Twenty four hours later, 100 μl IMDM culture medium was replacedwith fresh medium supplemented with 1M Hypoxanthine, 4 mM Aminopterinand 0.16M Thymidine (HAT), which was added to each microtiter well.Every 3-4 days thereafter, 100 μl IMDM culture medium was replaced withfresh medium containing HAT, HT and complete medium without HATsuccessively over a period of approximately 14 days. Upon reaching 75%confluence, the culture supernatants were tested for the presence offibrocystin-L antibody. The hybridomas of interest were cloned inlimiting dilution cultures at 1 cell per microtiter well and latersubcloned at 0.3 cells per microtiter well. Balb/c spleen cells serve asfeeder layer for fusions (5×10⁴ per well), cloning, and subcloning(3×10⁶ per well). Positive subclones were isotyped and cryopreserved.

Twenty-three reactive clones from 960 supernatants tested were detectedin an initial screen using an Immunetics Miniblotter 28® system forwestern blotting and membrane preparations from transfected PEAK cellsstably expressing the fibrocystin-L partial construct (confirmed by thedetection of a GFP tag by immunofluoresence microscopy). See FIGS. 8Cand 8D. Simultaneous western lanes were run using primary antibodies toFLAG and PK tags as positive controls. The reaction was detected withgoat anti-mouse IgG secondary antibodies conjugated to horseradishperoxidase (Dako), and developed using a chemiluminescent substrate withthe results being recorded on film.

Initial positivity was obtained by western blot from 14/960 hybridomasupernatants. Seven cloning plates subsequently had band reactivity(from 192 wells) and monoclonal antibodies generated. De StGrowth (1980)J. Immunol. Methods 35:1-21. Positive reactors underwent additionalrounds of re-screening using the same system. All positive supernatantsdetected a single prominent reactive band migrating at ˜75 kDaconsistent with the size of the expressed tagged protein construct. Noother reactive bands were detected from these lysates. Positive reactorsunderwent additional rounds of re-screening by western. Final repeatedscreenings of the chosen sub-cloned antibody supernatants from selectedcells yielded 10 monoclonal antibodies (Table 7).

TABLE 7 Monoclonal antibodies generated to fibrocystin-L and isotypedetails Name Isotype FibLA-2 IgG3 κ FibLA-10.1 IgG1 κ FibLA-10.2 IgG1 κFibLA-4.1 IgG1 λ FibLA-4.2 IgG1 λ FibLA-11.1 IgG1 λ FibLA-11.2 IgG1 λFibLA-11.3 IgG1 λ FibLA-13.1 IgG1 λ FibLA-13.2 IgG1 λ

To confirm that these antibodies were sensitive and specific, eachantibody was used to detect endogenous human fibrocystin-L in varioustissues by Western blotting. Peroxidase conjugated secondary antibody(goat anti-mouse IgG) was used to detect the primary antibodies. Eachantibody was able to detect endogenous fibrocystin-L.

Endometrial carcinoma tissue showed the highest level of expression offibrocystin-L protein, followed by moderate expression in activated Thuman lymphocytes, kidney, spleen and the erythroleukemia K562 cellline. A high molecular weight protein was detected at ˜500 kDa, with asecond smaller protein detected in some tissues. This smaller productwas clearly present in kidney, endometrium, and tonsillar tissue.Endogenous fibrocystin-L also was detected by Western blotting withmouse tissue membrane preparations; strongest levels were detected inkidney, followed by lung, thymus, liver, lymph node, spleen, andendometrium tissues using the FibLA4.1 and FibLA4.2 antibodies.

Example 8 Tissue Microarray Studies

Tissue microarray blocks were made from formalin-fixed paraffin embeddedarchival tissue, including 60 normal human tissue sections. Theexpression of fibrocystin-L was studied in the tissue microarray blocksusing the FibLA-4.1 and FibLA-11.3 antibodies and compared with thestaining using an isotype control antibody. The intensity of stainingwas graded on a scale of 0-4. Expression levels of fibrocystin-L weredemonstrated to be at a low level across many tissues (Table 8).Staining was most intense in fallopian tube epithelium in the normaltissue dataset. Control slides stained with IgG1 isotype controlantibody were negative. Staining intensity also was very prominent inthyroid epithelium (apical). Other sites with prominent staining wereliver parencyhmal cells, particularly around the central veins (controlliver sections were negative); adrenal cortex, in all three layers (zonafascicularis, reticulata and glomerulosa), gallbladder endothelium(apical and cytoplasmic), testis Leydig cells, and the breast ductepithelium and spleen red pulp. Follicles, the sites of B cells, werenegative.

Cilia are reported to occur in several of these unusual sites ofstaining; testicular interstitium in fertile men (containing myoidcells, fibroblast-like cells and Leydig cells) have one cilium per cellin the 9+0 microtubule configuration (Takayama (1981) Int. J. Androl.4:246-256). In thyroid follicular cells, cilia are thought to be in the9+2 microtubule configuation. Although the number of cilia per cell seenin scanning electron microscopy studies has been debated, the mostrecent studies have determined one 9+2 cilium is present on eachfollicular thyroid cell with cilia extending into the follicular lumenand abnormal secondary cilia observed in studies of follicular carcinoma(Martin (1988) Virchows Arch. B Cell Pathol. Incl. Mol. Pathol.55:159-166). Single cilia are reported in the cells of rat cortex andmedulla (Wheatley (1967) J. Anat. 101:223-237). Ward et al. also havepreviously detected polycystin-1 in Leydig cells of the testis and theadrenal cortex (Ward et al. (1996) Proc. Natl. Acad. Sci. USA93:1524-1528).

TABLE 8 Fibrocystin-L expression in normal human tissues Tissue 11.3Intensity Extent 4.1 Intensity Extent Thyroid 1 2 3 1 1 2 Esophagus LS 11 3 0 0 0 Esophagus XS 1 0 0 1 0 0 Stomach Mucosa LS 1 1 2 1 0 0 StomachMuscularis 1 0 0 1 1 3 Small Bowel 1 0 0 1 0 0 Sigmoid Colon Mucosa 1 00 1 0 0 Sigmoid Colon Muscularis 1 0 0 0 0 0 Colon Submucosa 0 0 0 1 1 1Tonsil 1 1 3 1 2 3 Adrenal 1 2 3 1 1 3 Pancreas 1 1 1 1 1 2 Ureter 1 1 31 1 4 Bladder Wall 1 0 0 1 0 0 Bladder Mucosa 1 0 0 0 0 0 Kidney 1 2 3 12 3 Lymph Node (colon/rectum) 1 1 1 1 0 0 Spleen 1 2 3 1 2 2 Thymus,preinvoluted 1 0 0 1 0 0 Thymus, involuted 1 0 0 1 0 0 Liver 1 2 4 1 2 4Skeletal Muscle 1 1 1 1 1 1 Lung (bronchioles) 1 1 1 1 0 0 Heart(epicardium) 1 0 0 0 0 0 Heart (myocardium) 1 1 1 1 0 0 Fallopian tube 13 4 1 3 4 Endometrium 1 1 2 1 0 0 Breast Stroma 1 0 0 1 0 0 Breast Ducts1 1 4 1 0 0 Hypodermis 1 0 0 0 0 0 Skin (thin) 1 0 0 1 0 0 Hippocampus 11 4 1 0 0 Cervix 1 1 3 1 0 0 Placenta 1 1 4 1 1 3 Testis 1 2 1 1 2 1Gall Bladder 1 2 4 1 1 4 Spleen 1 2 2 1 2 2 Prostate 1 2 3 1 1 2

Example 9 Fibrocystin-L Expression in Kidney

Kidney tissue was examined by western blot and immunohistochemistry.There appeared to be predominantly proximal but also some distal tubulestaining (both cytoplasmic in distribution). A granular pattern ofcytoplasmic distal tubule type of staining also was detected, but it ispossible this was artefactual. No staining was detected using an isotypecontrol IgG1 primary antibody simultaneously tested. Positively stainingproteinaceous casts also were found in medullary collecting ducts fromnormal kidney. Expression levels of fibrocystin-L were much lower whencompared with fibrocystin levels detected using immunohistochemistry.

Example 10 Studies of Fibrocystin-L in Endometrium

As there was an overrepresentation of PKHDL1 cDNAs in endometrialcarcinoma tissue, and since female reproductive tract epithelium (normalfallopian tube) appeared to have highest expression levels of theprotein, human endometrial carcinoma sections were examined in greaterdetail using additional tissue microarrays. Custom-made high qualitymicroarrays were obtained through the Mayo Clinic tissue array corefacility and Mayo Clinic Endometrial Cancer Working group as follows.

Five micron sections were made from formalin -fixed, paraffin-embeddedarchival tissue using 1-mm punches mounted on glass slides, and dewaxed.Endogenous peroxidase activity was blocked with 0.03% hydrogen peroxide,all sections were subjected to heat-induced epitope retrieval bysteaming in EDTA×40 minutes, and nonspecific binding sites were blockedin 5% Bovine Serum Albumin (Sigma) in PBS (pH 7.5). Immunohistochemicalstaining was performed on a Dako Autostainer (DakoCytomation,Carpenteria, Calif.). Staining was performed using the DakoEnvision+system HRP using 3,3-diaminobenzidine (DAB) (Dako; Code K4006)with primary monoclonal antibody against fibrocystin-L (4.1 and 11.3were found to be consistent for immunochemistry and western blotting).This system uses a two-step staining procedure that employs HRP-labeledpolymer conjugated with secondary antibodies. The labeled polymer doesnot contain avidin or biotin and as such, nonspecific staining wasavoided from endogenous avidin-biotin activity, which is oftenproblematic in kidney and liver tissue. The tissue was immunostainedwith immunohistochemical staining optimized for human endometriumtissue. After deparaffinization in xylene, slides were rehydratedthrough a graded series of alcohol and placed in TBS-Tween 0.1% followedby a 5-minute block of the endogenous peroxidase from the DakoEnvision+horseradish peroxidase kit. Antigen retrieval was thenperformed using steamer boiling in 1 mM EDTA pH 8.0 for 40 minutes andthen rinsed and washed in TBS-Tween 0.1%. The primary antibody (4.1:1 in50 dilution in fish blocking buffer; or mouse IgG1 isotype control(1:500 dilution; R& D Systems; Catalog MAB002)) was incubated on thetissue in a humidified container for 30 minutes, then washed inTBS-Tween 0.1% for 5 minutes on a horizontal shaker. The secondaryantibody was added and incubated again for 30 minutes. After a further 5minute wash step in TBS-Tween 0.1%, the peroxidase activity wasvisualized by incubating with DAB a at room temperature for 10 minutesthen rinsing in H₂0. The slides then were counterstained in 1:20hematoxolylin for two minutes at room temperature.

Stained slides were scanned and the digitized images were available forviewing along with the grid overlay (linked to an excel databasecontaining the specimen information and pathologists annotations) on theBacus Webslide® Browser software system (World wide web atbacuslabs.com) from a desktop PC. Grading of the normal human tissueslides for fibrocystin-L was evaluated relative to epithelial stainingintensity in the corresponding negative control slide and performed byan experienced pathologist. The reviewer assigned a score of 0 for nostaining, 1+ for weak staining, 2+ for moderate staining, and 3+ forstrong staining Eight slides containing human endometrial tissue,carcinoma of the endometrium in the same patients and othermiscellaneous cancer tissues from the same index cases were alsoevaluated on the same slides. Endometrial cancer slides are currentlybeing graded by an experienced pathologist with specific expertise inendometrial pathology. A preliminary analysis of the staining in thesetissues was performed using a scale of 0-3 grading for immunoreactivity.Fifty-eight patient samples from a total of 191 cases of endometrialcarcinoma were excluded from the paired analysis due to the absence ofeither the index normal or endometrial carcinoma tissue punch or aninsufficient tissue section available for quantitation.

The microarrays from this cohort of cancer patients incorporated normaland endometrial cancer tissues but also additional sections ofsynchronous cancers occurring in 521 patients. There was widespread lowintensity epithelial expression in the normal tissues but markedupregulation of the protein in the endometrial cancer tissue. Stainingalso was localized in the ciliated epithelium of endometrial epitheliumand in several ovarian cancers. Fibrocystin-L immunoreactivity wasvariable within normal endometrium, with the majority of cases showingweak fibrocystin-L epithelial expression. A low level of stromalimmunoreactivity also was seen in the majority of normal and endometrialcancer specimens. The staining was cytoplasmic in normal endometrialendothelium. High levels of fibrocystin-L expression seemed to belocalized in the apical membranes of endometrial cancer tissue in manyof the cases where upregulated expression was seen. Staining infallopian tube carcinomas also was examined (n=2), but the intensityseemed less than the levels detected in normal fallopian tubeendothelium.

Example 11 Analysis of Fibrocystin-L in Endometrial Cancer MicroarrayCohort

Of 521 patient samples in the dataset, 135 patients had staining datafrom both normal and cancer tissues. Eighty-six of 133 (65%) patientsections showed upregulation of the protein (median staining intensity)and the minority of cases showed either unchanged (43/133; 32%) ordown-regulation (4/133; 3%) of the protein compared with normalendometrium from these same patients. Paired analysis (using medianintensity of staining of normal and tumor tissues) demonstrated therewas a statistically significant upregulation of fibrocystin-L in thecancers (p<0.0001; Wilcoxon signed rank test (n=135)). Similar analysisusing maximal staining intensity demonstrated a statisticallysignificant difference in the staining intensity (P<0.0001; t test) inthese two groups and contingency table analysis using maximal gradeintensity also demonstrated a statistically significant difference inthe staining intensity (P<0.0027; Pearson test; see Table 9).

There was no correlation between survival (P=0.67; log rank test) orrecurrence rates (0.96 NS: log rank test) and patients with high, mediumor low expression of this protein. No correlation was observed betweenstaining intensity of fibrocystin-L and histologic grade (n=178; P=0.10;Pearson), cancer stage, body mass index, age, depth of myometrialinvasion or vaginal recurrence or hematologic spread. There did seem tobe a trend toward significance between fibrocystin-L staining intensityand nodal invasion (P=0.19; NS).

TABLE 9 Contingency table analysis of maximum grade of stainingintensity in cancer of endometrium (Y axis) by maximum grade observed innormal tissue sections (X axis) Count (%) Column % Total Row % 0 1 2 3(%) 0 20 3 0 0 23 14.81 2.22 0 0 17.04 27.78 6.25 0 0 86.96 13.04 0 0 116 21 1 1 39 11.85 15.56 0.74 0.74 28.89 22.22 43.75 7.69 50.00 41.0353.85 2.56 2.56 2 16 16 6 1 39 11.85 11.85 4.44 0.74 28.89 22.22 33.3346.15 50.00 41.03 41.03 15.38 2.56 3 20 8 6 0 34 14.81 5.93 4.44 0 25.1927.78 16.67 46.15 0 58.82 23.53 17.65 0 72 48 13 2 53.33 35.56 9.63 1.48Total = 135

Example 12 Expression of Fibrocystin-L in Other Human Cancers

Immunohistochemistry also was used to examine the extent offibrocystin-L staining in the 109 cases of synchronous cancers from thispatient cohort (Table 10). A subset of the breast, ovarian and coloncancers seemed to have significant staining of fibrocystin-L. Eleven outof twenty two (50%) ovarian cancers and five out of thirty (20%) breastcancers had grade 2 or 3 staining intensity.

Expression of fibrocystin-L also was assessed in K562 erythroleukemiacells using the FibLA-4.1 monoclonal antibody. Fibrocystin-L was presentin abundant amounts in the cytoplasm of K562 cells (confocal andimmunoflouoresence microscopy data) as indicated in images of fixedpermeabilized K562 cells grown in culture and also was detectable inmembrane preparations from this cell line by western blot.

TABLE 10 Analysis of fibrocystin-L immunostaining insynchronous/metachronous cancers studied in patients with endometrialcarcinoma No. (%) with grade 2 or 3 Tissue Number intensity stainingBreast 30 5/30 (20%) Ovary 22 11/22 (50%) Colon 20 3/20 (15%) Lymphoma 60/6 (0%) Lung 5 2/5 (40%) Thyroid 5 2/5 (40%) Melanoma 2 1/2 (50%)Omentum 4 1/4 (25%) Bladder 2 0/2 (0%) Skin/Vulva 2 0/2 (0%) Skin(scalp) 1 0/0 (0%) Mouth/Tongue 2 1/2 (50%) Myeloma 1 1/1 (100%) Renalclear cell 1 1/1 (100%) Stomach 2 0 (0%) GIST 1 0 (0%) Perineal 1 0 (0%)Appendix 1 2 (100%) Cecum 1 0 (0%) GIST; gastrointestinal stromal tumor

Other Embodiments

It is to be understood that while the invention has been described inconjunction with the detailed description thereof, the foregoingdescription is intended to illustrate and not limit the scope of theinvention, which is defined by the scope of the appended claims. Otheraspects, advantages, and modifications are within the scope of thefollowing claims.

1. An antibody having specific binding affinity for a fibrocystin-Lpolypeptide.
 2. The antibody of claim 1, wherein the fibrocystin-Lpolypeptide is encoded by SEQ ID NO:1.
 3. The antibody of claim 1,wherein the fibrocystin-L polypeptide is encoded by SEQ ID NO:2.
 4. Theantibody of claim 1, wherein the fibrocystin-L polypeptide comprises theamino acid sequence of SEQ ID NO:3.
 5. The antibody of claim 1, whereinthe fibrocystin-L polypeptide comprises the amino acid sequence of SEQID NO:4.
 6. The antibody of claim 1, wherein the antibody a polyclonalantibody.
 7. The antibody of claim 1, wherein the antibody is amonoclonal antibody.
 8. The antibody of claim 1, wherein the antibody isa humanized antibody.
 9. The antibody of claim 1, wherein the antibodyis a chimeric antibody.
 10. The antibody of claim 1, wherein theantibody is a single chain Fv antibody fragment.
 11. The antibody ofclaim 1, wherein the antibody is an Fab fragment.
 12. The antibody ofclaim 1, wherein the antibody is an F(ab)₂ fragment.
 13. A method fordetermining whether a subject has endometrial cancer, comprising: (a)using the antibody of claim 1 to measure the level of fibrocystin-Lpolypeptide in a biological sample from the subject, and (b) determiningthat the subject has endometrial cancer if the level of fibrocystin-Lpolypeptide in the biological sample is increased relative to the levelin a corresponding control sample, or determining that the subject doesnot have endometrial cancer if the level of fibrocystin-L polypeptide inthe biological sample is not increased relative to the level in thecorresponding control sample.
 14. The method of claim 13, wherein step(a) comprises using immunohistochemistry, western blotting, or ELISA.15. A method for determining whether a biological sample from a subjectcontains cancer cells, comprising: (a) measuring the level offibrocystin-L in the biological sample by contacting the sample with theantibody of claim 1, and (b) determining that the sample contains cancercells if the level of fibrocystin-L in the sample is increased relativeto a control level, or determining that the sample does not containcancer cells if the level of fibrocystin-L in the sample is notincreased relative to the control level.
 16. The method of claim 15,wherein the biological sample is an endometrial, breast, ovarian, lung,or colon sample.
 17. The method of claim 15, wherein the sample is abodily fluid sample.
 18. A method for measuring the level offibrocystin-L or fragments thereof in a sample of bodily fluid, themethod comprising contacting the sample with the antibody of claim 1,and measuring the level of fibrocystin-L or fragments thereof based onthe amount of antibody bound to the sample.
 19. The method of claim 18,wherein the bodily fluid comprises blood, serum, or urine.
 20. Themethod of claim 18, wherein the bodily fluid comprises cells, andwherein the method comprises measuring the amount of antibody bound tothe cells the sample.